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Abstract 


Czaplewski,  R.  L.  1994.  Variance  approximations  for  assessments  of 
classification  accuracy.  Res.  Pap.  RM-316.  Fort  Collins,  CO:  U.S. 
Department  of  Agriculture,  Forest  Service,  Rocky  Mountain  For- 
est and  Range  Experiment  Station.  29  p. 

Variance  approximations  are  derived  for  the  weighted  and  unweight- 
ed kappa  statistics,  the  conditional  kappa  statistic,  and  conditional 
probabilities.  These  statistics  are  useful  to  assess  classification  accu- 
racy, such  as  accuracy  of  remotely  sensed  classifications  in  thematic 
maps  when  compared  to  a  sample  of  reference  classifications  made 
in  the  field.  Published  variance  approximations  assume  multinomial 
sampling  errors,  which  implies  simple  random  sampling  where  each 
sample  unit  is  classified  into  one  and  only  one  mutually  exclusive 
category  with  each  of  two  classification  methods.  The  variance  ap- 
proximations in  this  paper  are  useful  for  more  general  cases,  such  as 
reference  data  from  multiphase  or  cluster  sampling.  As  an  example, 
these  approximations  are  used  to  develop  variance  estimators  for 
accuracy  assessments  with  a  stratified  random  sample  of  reference 
data. 

Keywords:  Kappa,  remote  sensing,  photo-interpretation,  stratified 
random  sampling,  cluster  sampling,  multiphase  sampling,  multivari- 
ate composite  estimation,  reference  data,  agreement. 
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Variance  Approximations 
for  Assessments  of  Classification  Accuracy 

Raymond  L.  Czaplewski 


INTRODUCTION 

Assessments  of  classification  accuracy  are  important 
to  remote  sensing  applications,  as  reviewed  by 
Congalton  and  Mead  (1983),  Story  and  Congalton  (1986), 
Rosenfield  and  Fitzpatrick-Lins  (1986),  Campbell  (1987, 
pp.  334-365),  Congalton  (1991),  and  Stehman  (1992). 
Monserud  and  Leemans  (1992)  consider  the  related 
problem  of  comparing  different  vegetation  maps.  Re- 
cent literature  favors  the  kappa  statistic  as  a  method  for 
assessing  classification  accuracy  or  agreement. 

The  kappa  statistic,  which  is  computed  from  a  square 
contingency  table,  is  a  scalar  measure  of  agreement  be- 
tween two  classifiers.  If  one  classifier  is  considered  a 
reference  that  is  without  error,  then  the  kappa  statistic 
is  a  measure  of  classification  accuracy.  Kappa  equals  1 
for  perfect  agreement,  and  zero  for  agreement  expected 
by  chance  alone.  Figure  1  provides  interpretations  of 
the  magnitude  of  the  kappa  statistic  that  have  appeared 
in  the  literature.  In  addition  to  kappa,  Fleiss  (1981)  sug- 
gests that  conditional  probabilities  are  useful  when  as- 
sessing the  agreement  between  two  different  classifi- 
ers, and  Bishop  et  al.  (1975)  suggest  statistics  that 
quantify  the  disagreement  between  classifiers. 

Existing  variance  approximations  for  kappa  assume 
multinomial  sampling  errors  for  the  proportions  in  the 
contingency  table;  this  implies  simple  random  sampling 


Landis  and  Koch  Fleiss        Monserud  and  Leemans 

(1977)  (1981)  (1992) 


Figure  1.  —  Interpretations  of  kappa  statistic  as  proposed  in  past 
literature.  Landis  and  Koch  (1977)  characterize  their  interpretations 
as  useful  benchmarks,  although  they  are  arbitrary;  they  use  clini- 
cal diagnoses  from  the  epidemiological  literature  as  examples. 
Fleiss  (1981,  p.  218)  bases  his  interpretations  on  Landis  and  Koch 
(1977),  and  suggests  that  these  interpretations  are  suitable  for 
"most  purposes."  Monserud  and  Leemans  (1992)  use  their  inter- 
pretations for  global  vegetation  maps. 


where  each  sample  unit  is  classified  into  one  and  only 
one  mutually  exclusive  category  with  each  of  the  two 
methods  (Stehman  1992).  This  paper  considers  more 
general  cases,  such  as  reference  data  from  stratified  ran- 
dom sampling,  multiphase  sampling,  cluster  sampling, 
and  multistage  sampling. 

KAPPA  STATISTIC  (k  ) 

The  weighted  kappa  statistic  (kJ  was  first  proposed 
by  Cohen  (1968)  to  measure  the  agreement  between  two 
different  classifiers  or  classification  protocols.  Let  p.. 
represent  the  probability  that  a  member  of  the  popula- 
tion will  be  assigned  into  category  i  by  the  first  classi- 
fier and  category  ;'by  the  second.  Let  k  be  the  number  of 
categories  in  the  classification  system,  which  is  the  same 
for  both  classifiers,  k  is  a  scalar  statistic  that  is  a  non- 

W 

linear  function  of  all  k2  elements  of  the  k  x  k  contin- 
gency table,  where  p{.  is  the  i/th  element  of  the  contin- 
gency table.  Note  thai  the  sum  of  all  k2  elements  of  the 
contingency  table  equals  1: 

i=l  ;=1 

Define  w~  as  the  value  which  the  user  places  on  any 
partial  agreement  whenever  a  member  of  the  popula- 
tion is  assigned  to  category  i  by  the  first  classifier  and 
category  by  the  second  classifier  (Cohen  1968).  Typi- 
cally, the  weights  range  from  0  <  w{.  <  1,  with  wu  =  1 
(Landis  and  Koch  1977,  p.  163).  For  example,  w{-  might 
equal  0.67  if  category  i  represents  the  large  size  class 
and  is  the  medium  size  class;  if  r  represents  the  small 
size  class,  then  wir might  equal  0.33;  and  wis  might  equal 
0.0  if  s  represents  any  other  classification.  The  un- 
weighted kappa  statistic  uses  wn  =  1  and  w{.  =  0  for  i  *  j 
(Fleiss  1981,  p.  225),  which  means  that  the  agreement 
must  be  exact  to  be  valued  by  the  user. 

Using  the  notation  of  Fleiss  et  al.  (1969),  let: 
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Using  this  notation,  the  weighted  kappa  statistic  [kw)  as 
defined  by  Cohen  (1968)  is  given  as: 


Po-Pc 


[6] 


1-Pc 

Estimated  Weighted  Kappa  {kw) 

The  true  proportions  p~  are  not  known  in  practice, 
and  the  true  must  be  estimated  with  estimated  pro- 
portions in  the  contingency  table  ( 


Kw  - 


1-Pc 
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where  po  and  pc  are  defined  as  in  Eqs.  2,  3,  4,  and 
using      in  place  of  p... 

The  true  k  equals  the  estimated  k  plus  an  unknown 


random  error  £^: 


K,.r  =  k,„  +  £, 
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If  k  is  an  unbiased  estimate  of  k  ,  then  E[ek]  =  0  and 
£[£J  =  kw.  By  definition,  E[e|]  =  E[(K:w  -£j2],  and  the 
variance  of  jr.,  is: 


Var(Jej  =  E[£|]-£'2[eJ  =  ^[4]. 
Taylor  Series  Approximation  for  Var  (  kw  ) 


[9] 


kw  is  a  nonlinear,  multivariate  function  of  the  k2  ele- 
ments (p^)  in  the  contingency  table  (Eqs.  2,  3,  4,  5,  and 
6).  The  multivariate  Taylor  series  approximation  is  used 
to  produce  an  estimated  variance  Var(fCw,)-  Let 

£ij  =  iPij  ~  Pij )  > and  /  3ft;  Vrt  be  the  partiaf deriva- 
tive of  kw  with  respect  to  p~  evaluated  at  p^  =  p/; .  The 
multivariate  Taylor  series  expansion  (Deutch  1965,  pp. 
70-72)  of  K^is: 
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where  R  is  the  remainder.  In  addition,  assume  that  pj7  is 
nearly  equal  to  pi}  (i.e.,  p{.  ~  p^);  hence,  £..  «  0  because 
£ij  =  iPij  ~  Pij ) '  the  higher-order  products  of  8..  in  the  Tay- 
lor series  expansion  are  assumed  to  be  much  smaller 
than  £jj,  and  the  R  in  Eq.  10  is  assumed  to  be  negligible. 
Eq.  10  is  linear  with  respect  to  all  eif  =  (pj;  -  p^). 

The  Taylor  series  expansion  in  Eq.  10  provides  the 
following  linear  approximation  after  ignoring  the  re- 
mainder R: 
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The  squared  random  error  approximately  equals  e2k  from 
Eq.  11: 
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From  Eqs.  9  and  12,  Var  [kw)  is  approximately: 
Var(0"XX 
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This  corresponds  to  the  approximation  using  the  delta 
method  (e.g.,  Mood  et  al.  1963,  p.  181;  Rao  1965,  pp. 
321-322).  The  partial  derivatives  needed  for  Vai{icw) 
in  Eq.  13  are  derived  in  the  following  section. 


Partial  Derivatives  for  Var  ( kw )  Approximation 

The  partial  derivative  of  kw  in  Eq.  13  is  derived  by  re- 
writing kw  as  a  function  of  p-.  First,  pQ  in  Eq.  6  is  ex- 
panded to  isolate  the  p{.  term  using  the  definition  of  p0 
in  Eq.  4: 

k  k 

Po=WijPij+yy,WrsPrs'  r  , 

£f£?  [14] 

(re)  *  Uj) 

The  partial  derivative  of  po  with  respect  to  p{-  is  simply: 

<?Po  _ 


w ... 
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As  the  next  step  in  deriving  the  partial  derivative  of  k.w 
in  Eq.  13,  pc  in  Eq.  6  is  expanded  to  isolate  the  p^  term 
using  the  definition  of  pc  in  Eq.  5: 
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Finally,  the  partial  derivative  of  pc  with  respect  to  p~  is 
simply: 


[19] 


The  partial  derivative  of  kw  (Eq.  6)  with  respect  to  p~  is 
determined  with  Eqs.  15  and  19: 
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The     term  in  Eqs.  17  and  20  can  be  simplified: 
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Using  the  notation  of  Fleiss  et  al.  (1969): 
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Replacing  fe..  from  Eq.  21  into  the  partial  derivative  of 
£w  from  Eq.  20: 
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Equation  24  contains  terms  that  are  imbedded  within 
the  Wj,  and  W:  terms  (Eqs.  22  and  24).  Any  higher-order 
partial  derivatives  should  use  Eq.  20  rather  than  Eq.  24. 


First-Order  Approximation  of  Var  ( kw  ) 

The  first-order  variance  approximation  for  kw  is  de- 
termined by  combining  Eqs.  13  and  24: 
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The  multinomial  distribution  is  typically  used  for 
i?[£r£re]  in  Eq.  25  (see  also  Eq.  104).  However,  other 
types  of  covariance  matrices  are  possible,  such  as  the 
covariance  matrix  for  a  stratified  random  sample  (see 
Eqs.  124  and  125),  the  sample  covariance  matrix  for  a 
simple  random  sample  of  cluster  plots  (see  Eq.  105), 
or  the  estimation  error  covariance  matrix  for  multivari- 
ate composite  estimates  with  multiphase  or  multistage 
samples  of  reference  data  (see  Czaplewski  1992). 

Var0  {kw )  Assuming  Chance  Agreement 

In  many  accuracy  assessments,  the  null  hypothesis 
is  that  the  agreement  between  two  different  protocols 
is  no  greater  than  that  expected  by  chance,  which  is 
stated  more  formally  as  the  hypothesis  that  the  row  and 
column  classifiers  are  independent.  Under  this  hypoth- 
esis, the  probability  of  a  unit  being  classified  as  type  i 
with  the  first  protocol  is  independent  of  the  classifica- 
tion with  the  second  protocol,  and  the  following  true 
population  parameters  are  expected  (Fleiss  et  al.  1969): 


Pii  =  PiPj 


[26] 


Substituting  Eq.  26  into  Eq.  4  and  using  the  definition 
of  pc  in  Eq.  5,  the  hypothesized  true  value  of  po  under 
this  null  hypothesis  is: 
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[27] 


Substituting  Eqs.  26  and  27  into  Eq.  25,  the  approxi- 
mate variance  of  kw  expected  under  the  null  hypoth- 
esis is: 
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The  covariances  EQ  [£,%•£„]  in  Eq.  28  need  to  be  esti- 
mated under  the  conditions  of  the  null  hypothesis, 
namely  that  ptj  =  pj .p;  (see  Eqs.  113,  114,  and  117). 


Unweighted  Kappa  ( £ ) 

The  unweighted  kappa  {k)  treats  any  lack  of  agree- 
ment between  classifications  as  having  no  value  or 
weight,  k  is  used  in  remote  sensing  more  often  than 
the  weighted  kappa  ( kw ).  k  is  a  special  case  of  kw  (Fleiss 
et  al.  1969),  in  which  w~  =  1  and  w{.  =  0  for  i  ^  /.  In  this 
case,  kw  is  defined  as  in  Eq.  6  (Fleiss  et  al.  1969)  with 
the  following  intermediate  terms  in  Eqs.  4,  5,  22,  and 
23  equal  to: 
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Replacing  Eqs.  29,  30,  31,  and  32  into  Var(£j  in  Eq. 
25,  where  w..  =  0  if  i&  and  wi{  =  1,  the  variance  of  the 
unweighted  kappa  is: 
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Likewise,  the  variance  of  the  unweighted  kappa  sta- 
tistic under  the  null  hypothesis  of  chance  agreement  is 
a  simplification  of  Eq.  28  or  33.  Under  this  null  hypoth- 
esis, p^  =  Pi.p.j  and  po  =  pc  (see  Eq.  27): 

k  k 

^(l-Pi-Pi.)^Eoteii£jfr-P,-P,) 

Var(£)  =  -^  r=1    A   9  .  [34] 

(1-PJ2 

The  covariances  Ea  [eyjerr  ]  in  Eq.  34  need  to  be  estimated 
under  the  conditions  of  the  null  hypothesis,  namely  that 
Pii  =  PiPj  (see  Eqs.  113,  114,  and  117). 

Note  that  the  variance  estimators  in  Eqs.  33  and  34 
are  approximations  since  they  ignore  higher-order  terms 
in  the  Taylor  series  expansion  (see  Eqs.  10,  12,  and  13). 
In  the  special  case  of  simple  random  sampling,  Stehman 
(1992)  found  that  this  approximation  was  satisfactory 
except  for  sample  sizes  of  60  or  fewer  reference  plots; 
these  results  are  based  on  Monte  Carlo  simulations  with 
four  hypothetical  populations. 


Matrix  Formulation  of  k  Variance  Approximations 

The  formulae  above  can  be  expressed  in  matrix  alge- 
bra, which  facilitates  numerical  implementation  with 
matrix  algebra  software. 

Let  P  represent  the  matrix  in  which  the  i/th  ele- 
ment of  P  is  the  scalar  p^.  In  remote  sensing  jargon,  P  is 
the  "error  matrix"  or  "confusion  matrix."  Note  that  k  is 


the  number  of  categories  in  the  classification  system. 
Let  py  be  the  kxl  vector  in  which  the  ith  element  is  the 
scalar  pf  (Eq.  2),  and  p  ;  be  the  kxl  vector  in  which  the 
ith  element  is  p.,  (Eq.  3 J.  From  Eqs.  2  and  3: 

p,  =P1,  [35] 

P;=P'1>  [36] 

where  1  is  the  kxl  vector  in  which  each  element  equals 
1,  and  P'  is  the  transpose  of  P.  The  expected  matrix  of 
joint  classification  probabilities,  analogous  to  P,  under 
the  hypothesis  of  chance  agreement  between  the  two 
classifiers  is  the  kxk  matrix  Pc,  where  each  element  is 
the  product  of  its  corresponding  marginal: 

PC=P;.P7-  [37] 

Let  W  represent  the  kxk  matrix  in  which  the  i/th  ele- 
ment is  w{-  (i.e.,  the  weight  or  "partial  credit"  for  the 
agreement  when  an  object  is  classified  as  category  i  by 
one  classifier  and  category  y'by  the  other  classifier).  From 
Eqs.  4  and  5, 


p0  =1'(W®P)1, 
pc  =1'(W®PC)1, 


[38] 
[39] 


where  ®  represents  element-by-element  multiplication 
(i.e.,  the  i/th  element  of  A®B  is  Q^b-,  and  matrices  A 
and  B  have  the  same  dimensions).  The  weighted  kappa 
statistic  {kw)  equals  Eq.  6  with  po  and  pc  defined  in  Eqs. 
38  and  39. 

The  approximate  variance  of  can  be  described  in 
matrix  algebra  by  rewriting  the  kxk  contingency  table 
as  a  ic2xl  vector,  as  suggested  by  Christensen  (1991). 
First,  rearrange  the  kxk  matrix  P  into  the  following  k2xl 
vector  denoted  vecP;  if  p.  is  the  kxl  column  vector  in 


which  the  ith  element  equals  p{-,  then  p  =  [pa  p2  L  |  pj , 
and  vecP  =  [-p[  p'2  L  |p[J\  Let  Cov(vecP)  denote  the 
k2xk2  covariance  matrix  for  the  estimate  vecP  of  vecP , 
such  that  the  uvth  element  of  Cov(vecP)  equals  E[[vecPu 
-vecPu)  [vecPv  -vecPv)] and  vecPu  represents  the  uth 
element  of  vecP.  Define  the  kxl  intermediate  vectors: 


w.  =  W  p ; , 

w  ,  =W  p;  , 
and  from  Eq.  25  ,  the  k2xl  vector: 


&k=[l-pc)vecW-[l-Po) 


[40] 


[41] 


w2. 

+  vec 

M 

M 

[42] 


where  vecW  is  the  ic2xl  vector  version  of  the  weighting 
matrix  W,  which  is  analogous  to  vecP  above.  Examples 
of  Wy. ,  w.; ,  and     are  given  in  tables  land  2. 
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Table  1 .  — 


Example  data1  from  Fleiss  et  al.  (1969,  p.  324)  for  weighted  kappa  ( kw ),  including  vectors  used  in  matrix  algebra  formulation. 


Classifier  A 


Classifier  B  Statistic  12  3 


/=1 

1 

0 

0.4444 

0.60 

0.6944 

U.Jdo 

U.UD 

A  A  A 

0.02 

PiPj 

u.jy 

0.15 

A  A^ 

0.06 

W  j  / 

i  OOQO 

i  .oooy 

1  .Ubl  1 

1 .261 1 

i=2 

0 

1 

0.6667 

0.30 

0.3167 

U.  1  1 

0.14 

0.05 

PzPj 

0.195 

0.075 

0.03 

w2  +  W  j 

0.9611 

0.6833 

0.8833 

i=3 

W3j 

0.4444 

0.6667 

1 

0.10 

0.5555 

0.01 

0.06 

0.03 

PljPj 

0.065 

0.025 

0.01 

W3  +  W.j 

1.1200 

1 .1222 

Pi 

0.65 

0.25 

0.10 

1.00 

W.j 

0.6444 

0.3667 

A    C  A  A^ 

0.5667 

Weighted  k  from  Fleiss  et  al.  (1969,  p.  324) 

icw  =  0.5071 

Var(vJ  =0.003248 

Var0( 

jcw)  =  0.004269 

p0  =  0.8478 

Pc 

=  0.6756 

P 

subscripts 

/ 

j 

vecP 

vecW 

W2    L  |  IV,.)' 

vec(w'Aw#  l  \w.ky 

1 

1 

1 

0.53 

1 

0.6944 

0.6444 

0.1472 

1 

2 

2 

0.11 

0 

0.3167 

0.6444 

-0.2050 

1 

3 

3 

0.01 

0.4444 

0.5555 

0.6444 

-0.0637 

2 

1 

4 

0.05 

0 

0.6944 

0.3667 

-0.2264 

2 

2 

5 

0.14 

1 

0.3167 

0.3667 

0.2870 

2 

3 

6 

0.06 

0.6667 

0.5555 

0.3667 

0.0918 

3 

1 

7 

0.02 

C\  A  A  A  A 
U.4444 

n  CQA  A 
U.by44 

n  CKK7 

— U.U/D/ 

3 

2 

8 

0.05 

0.6667 

0.3167 

0.5667 

0.1001 

3 

3 

9 

0.03 

1 

0.5555 

0.5667 

0.1934 

Unweighted  k  from  Fleiss  et  al.  (1969,  p.  326) 

kw  =  0.4286 

Var(£)  =  0.002885 

Var0(v) 

=  0.003082 

p0  =  0.7000 

Pc 

=  0.4750 

P 

subscripts 

/' 

j 

vecP 

vec  W 

W2    L  )' 

i/ecCw'Jw;,  l  \w.kY 

1 

1 

1 

0.53 

1 

0.65 

0.60 

0.150 

1 

2 

2 

0.11 

0 

0.25 

0.60 

-0.255 

1 

3 

3 

0.01 

0 

0.10 

0.60 

-0.210 

2 

1 

4 

0.05 

0 

0.65 

0.30 

-0.285 

2 

2 

5 

0.14 

1 

0.25 

0.30 

0.360 

2 

3 

6 

0.06 

0 

0.10 

0.30 

-0.120 

3 

1 

7 

0.02 

0 

0.65 

0.10 

-0.225 

3 

2 

8 

0.05 

0 

0.25 

0.10 

-0.105 

3 

3 

9 

0.03 

1 

0.10 

0.10 

0.465 

'  The  covariance  matrix  for  the  estimated  joint  probabilities  (p?)  is  estimated  assuming  the  multinomial  distribution  (see  Eqs.  46,  47,  128,  130,  131, 
and  132. 
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Table  2.  —  Example  data1  from  Bishop  et  al.  (1975,  p.  397)  for  unweighted  kappa  (k),  including  vectors  used  in  matrix  formulation. 


Example  contingency  table 
Bishop  et  al.  (1975,  p.  397) 


/=1  j=2  j=3 


1  0.2361  0.0556  0.1111  0.4028  n  =  72 

2  0.0694  0.1667  0  0.2361  £=0.3623 

3  0.1389  0.0417  0.1806  0.3611  Var(£)  =  0.007003 


p.  0.4444  0.2639  0.2917 


Var(£)  =  0.0082354 


Vectors  used  in  matrix  computations  for  Var(fc)  and  Var0(£) 


ij 

vecP 

vecPc 

vecW  {Wj. 

w2]l  \  wk  y 

11 

0.2361 

0.1790 

1 

0.4444 

21 

0.0694 

0.1049 

0 

0.2639 

31 

0.1389 

0.1605 

0 

0.2917 

12 

0.0556 

0.1063 

0 

0.4444 

22 

0.1667 

0.0623 

1 

0.2639 

32 

0.0417 

0.0953 

0 

0.2917 

13 

0.1111 

0.1175 

0 

0.4444 

23 

0.0000 

0.0689 

0 

0.2639 

33 

0.1806 

0.1053 

1 

0.2917 

vec(w'  w'2  l  \w'k)' 


0.4028 
0.4028 
0.4028 
0.2361 
0.2361 
0.2361 
0.3611 
0.3611 
0.3611 


Covariance  matrix  for  vecp  assuming  multinomial  distribution.  See  Cov(izecP)  in  Eq.  128. 


/=1 

/=2 

/=3 

/=1 

7=2 

/=3 

/=1 

i=2 

A=3 

ij 

7=1 

7=1 

y=i 

7=2 

7=2 

7=2 

7=3 

7=3 

7=3 

11 

0.0025 

-0.0002 

-0.0005 

-0.0002 

-0.0005 

-0.0001 

-0.0004 

0 

-0.0006 

21 

-0.0002 

0.0009 

-0.0001 

-0.0001 

-0.0002 

-0.0000 

-0.0001 

0 

-0.0002 

31 

-0.0005 

-0.0001 

0.0017 

-0.0001 

-0.0003 

-0.0001 

-0.0002 

0 

-0.0003 

12 

-0.0002 

-0.0001 

-0.0001 

0.0007 

-0.0001 

-0.0000 

-0.0001 

0 

-0.0001 

22 

-0.0005 

-0.0002 

-0.0003 

-0.0001 

0.0019 

-0.0001 

-0.0003 

0 

-0.0004 

32 

-0.0001 

-0.0000 

-0.0001 

-0.0000 

-0.0001 

0.0006 

-0.0001 

0 

-0.0001 

13 

-0.0004 

-0.0001 

-0.0002 

-0.0001 

-0.0003 

-0.0001 

0.0014 

0 

-0.0003 

23 

0 

0 

0 

0 

0 

0 

0 

0 

0 

33 

-0.0006 

-0.0002 

-0.0003 

-0.0001 

-0.0004 

-0.0001 

-0.0003 

0 

0.0021 

Covariance  matrix  for  vecp  under  null  hypothesis  of  independence  between  the  row  and  column  classifiers 
and  the  multinomial  distribution.  See  Cov0  (vecP)  in  Eqs.  130, 131  and  132. 


A=1 

i=2 

/=3 

A=1 

t=2 

f=3 

/=1 

i=2 

A=3 

ij 

7=1 

7=1 

7=1 

7=2 

7=2 

7=2 

7=3 

7=3 

7=3 

11 

0.0020 

-0.0003 

-0.0004 

-0.0003 

-0.0002 

-0.0002 

-0.0003 

-0.0002 

-0.0003 

21 

-0.0003 

0.0013 

-0.0002 

-0.0002 

-0.0001 

-0.0001 

-0.0002 

-0.0001 

-0.0002 

31 

-0.0004 

-0.0002 

0.0019 

-0.0002 

-0.0001 

-0.0002 

-0.0003 

-0.0002 

-0.0002 

12 

-0.0003 

-0.0002 

-0.0002 

0.0013 

-0.0001 

-0.0001 

-0.0002 

-0.0001 

-0.0002 

22 

-0.0002 

-0.0001 

-0.0001 

-0.0001 

0.0008 

-0.0001 

-0.0001 

-0.0001 

-0.0001 

32 

-0.0002 

-0.0001 

-0.0002 

-0.0001 

-0.0001 

0.0012 

-0.0002 

-0.0001 

-0.0001 

13 

-0.0003 

-0.0002 

-0.0003 

-0.0002 

-0.0001 

-0.0002 

0.0014 

-0.0001 

-0.0002 

23 

-0.0002 

-0.0001 

-0.0002 

-0.0001 

-0.0001 

-0.0001 

-0.0001 

0.0009 

-0.0001 

33 

-0.0003 

-0.0002 

-0.0002 

-0.0002 

-0.0001 

-0.0001 

-0.0002 

-0.0001 

0.0013 

1  The  covariance  matrix  for  the  estimated  joint  probabilities  (Pij)  is  estimated  assuming  the  multinomial  distribution  (see  Eqs.  46,  47,  128,  130,  131, 
and  132). 

2  Var(ic)  =  0.0101  in  Bishop  et  al.  (1975)  is  a  computational  error.  The  correct  Var(ic)  is  0.008235  (Hudson  and  Ramm  1987). 


The  approximate  variance  of  kw  expressed  in  matrix 
algebra  is: 

Var      )  =  [d;Cov(vecP)  dk ]/(l  -  pcf  .  [43] 

See  Eqs.  104  and  105  for  examples  of  Cov(vecP).  The 
variance  estimator  in  Eq.  43  is  equivalent  to  the  estima- 
tor in  Eq.  25.  Tables  1  and  2  provide  examples. 

The  structure  of  Eq.  43  reflects  its  origin  as  a  linear 
approximation,  in  which  kw  «  [vecP)'dk  •  This  suggests 
different  and  more  accurate  variance  approximations 
using  higher-order  terms  in  the  multivariate  Taylor  series 
approximation  for  the  d,  vector,  and  these  types  of  ap- 
proximations will  be  explored  by  the  author  in  the  future. 

The  variance  of  kw  under  the  null  hypothesis  of 
chance  agreement,  i.e.,  Var0  [kw)  in  Eq.  28,  is  expressed 
by  replacing  po  with  pc  in  Eq.  42: 


Equation  104  expresses  these  covariances  in  matrix 
form. 

Replacing  Eqs.  46  and  47  into  the  Var(£  )  from  Eqs. 
13  and  24: 


Var(£j  =  < 


k  k 

•II 

i=l  7=1 

2 

\Pij=Pii_ 

n 

+  { 


k  k 

II 

i=j  j=l 


\dP<u 


Pii=Pn 


II 

r=l  s=l 
\rs)*[ij) 


PijPr 


n 


vr 

W2- 

<  vecW  - 

-  vec 

> 

M 

M 

_Wk  ■_ 

[44] 


Var0  [kw)  =  [d'k__0  Cov0  (vecP)  d^0]/(l  -  pc)4.  [45] 

Tables  1  and  2  provide  examples  of  dk=Q  and  Var0  (£„ ) 
for  the  multinomial  distribution.  The  covariance  ma- 
trix CovQ(vecP)  in  Eq.  45  must  be  estimated  under  the 
conditions  of  the  null  hypothesis,  namely  that 
E\-Pij]  =  Pi  P  j  (see  Eqs.  113,  114,  and  117). 

The  estimated  variances  for  the  unweighted  ic  statis- 
tics, e.g.,  Var(/r)  in  Eq.  33  and  Var0  [k)  in  Eq.  34,  can  be 
computed  with  matrix  Eqs.  43  and  45  using  W  =  I  in 
Eqs.  37,  38,  40,  and  41,  where  I  is  the  kxk  identity  ma- 
trix. Tables  1  and  2  provide  examples. 


Verification  with  Multinomial  Distribution 

The  variance  approximation  \7ar[Kw)  in  Eq.  25  has 
been  derived  by  Everitt  (1968)  and  Fleiss  et  al.  (1969) 
for  the  special  case  of  simple  random  sampling,  in  which 
each  sample  unit  is  independently  classified  into  one 
and  only  one  mutually  exclusive  category  using  each 
of  two  classifiers.  In  the  case  of  simple  random  sam- 
pling, the  multinomial  or  multivariate  hypergeqmetric 
distributions  provide  the  covariance  matrix  for  E[eij£rs] 
in  Eq.  25.  The  purpose  of  this  section  is  to  verify  that 
Eq.  25  includes  the  results  of  Fleiss  et  al.  (1969)  in  the 
special  case  of  the  multinomial  distribution. 

The  covariance  matrix  for  the  multinomial  distribu- 
tion is  given  by  Ratnaparkhi  (1985)  as  follows: 

Pij^-Pij) 


Cov(p,p.. )  =  Var  (p, )  =  E[el  ]  -  E2  [e,  ]  = 


n 


[46] 


Var(£j  = 


k  k 


-|2 


k  k 


XX  feL.=,   n  XX  fe),  . 

2=1  ;=1  L  \P"  P"  J     11        f=l   i=l  \PirPii 


i=l  ;=1 


"l2 


El 

n 


k  k 


3kw 


II 

i=j  ;=1 
~XX  fe") 


Pii=Pij 


k     k  r 


XX  itl) 


i=l   ;=1  L 


r=l  s=l 
P\ 


Prs=Pn 


P.jPr. 


PirPa 


k  k 


i=l  ;=1 


1 2 


Pij=Pij 


k  k 


411 


i=l  j=l 


PirPn 


k  k 


Pi/ XX 

r=l  s=l 


[48] 


The  following  term  in  Var^)  from  Eq.  48  can  be 
simplified  using  the  definition  of  po  in  Eq.  4,  and  the 
definition  of  pi;  and  p  y  in  Eqs.  2  and  3: 


^  k 

XX 


1=1  7=1 
k 


V  r>)  J 


Pij=p 


Pa 


(1-Pc) 

From  Eqs.  5  and  22, 


dp 

I^A  +Iiv,P, 


1=1 


+ 


Po 


[49] 


1-Pc 


Cov(pl7prs|lrs)5t|j7))  -  E[eiyew|{reJ^w, 

Pi/ A* 
77 


£[ey/]£[£ 


rs  {rs}*{j; 


[47] 


k  k   (  k  ^ 

X^'  A  =  X  H,waPj 

1=1  ^  ,=1  J 


y=l 


Pi-  =  Pc 


[50] 


8 


k  (  k 


7=1 


P.;  =  Pc 


[51] 


Using  the  definition  of  pc  in  Eqs.  50  and  51,  Eq.  49  sim- 
plifies to: 


k  k 

11 

i=l  i=l 


V  r>>  J 


_PoPc-2Pc+Po 


Pii     (l-pj   '  [52] 


Likewise,  the  following  term  in  Var(£H,)  from  Eq.  48  is 
derived  directly  from  Eq.  24: 


k  k 

11 

i=l  1=1 


Jc  7c 


=  y  y  -  Ky(i-pc)-K.  +  ^.;)(i-p0)r  [53] 


7=1  /=1 


Substituting  Eqs.  52  and  53  back  into  Eq.  48: 


A:  7c 


{  Pc'      7=1  j=1 


-(wi.+w./)(l-p0)l5 


n[l-pcT 


iPoPc-2Pc+Pof 


[54] 


which  agrees  with  the  results  of  Fleiss  et  al.  (1969).  This 
partially  validates  the  more  general  variance  approxi- 
mation VarOc  )  in  Eq.  24. 

Likewise,  Var  [ic  )  in  Eq.  28  can  be  shown  to  be  equal 
to  the  results  of  Fleiss  et  al.  (1969,  Eq.  9)  for  the  multi- 
nomial distribution  using  Eqs.  1  and  50  and  the  follow- 
ing identity: 


k  k 


XX^^/  ~^W'P^Pj  =Pc 


[55] 


i=j  j=i 


7=1 


7=1 


In  the  special  case  of  the  multinomial  distribution, 
Var(£)  in  Eq.  33  agrees  with  Fleiss  et  al.  (1969,  Eq.  13), 
where  Eqs.  29  and  30  and  the  following  three  identities 
are  used  in  Eq.  33: 


k      k      k  k 


1111 

Pij  Prs  [p.i+Pj.){p.r+Ps.)  =  4Pc 


i=l  ;=i  r=l  s=l 


ic  k 

11 

1=1  /=1 


PiiPij  =  Po 


A 

H-Pof^PiiiP.i+Pi.f 

7=1 

k 

-2(l-p0)(l-pc)^p,(p,+p,) 


7=1 


7c 

Y,P" 

7=1 


-\2 


-Po  (1-Pc)5 


Examples  given  by  Fleiss  et  al.  (1969)  and  Bishop  et 
al.  (1975)  were  used  to  further  validate  the  variance 
approximations,  although  this  validation  is  limited  by 
its  empirical  nature  and  use  of  the  multinomial  distri- 
bution. Results  are  in  tables  1  and  2. 

In  a  similar  empirical  evaluation,  Eq.  43  for  the 
unweighted  kappa  ( kw ,  W  =  I)  agrees  with  the  unpub- 
lished results  of  Stephen  Stehman  (personal  communi- 
cation) for  stratified  sampling  in  the  3x3  case  when  used 
with  the  covariance  matrix  in  Eqs.  124  and  133  (after 
transpositions  to  change  stratification  to  the  column 
classifier  as  in  Eq.  123  and  using  the  finite  population 
correction  factor). 

CONDITIONAL  KAPPA  ( K-)  FOR  ROW  i 

Light  (1971)  considers  the  partition  of  the  overall  co- 
efficient of  agreement  (k)  into  a  set  of  k  partial  k  statis- 
tics, each  of  which  quantitatively  describes  the  agree- 
ment for  one  category  in  the  classification  system.  For 
example,  assume  that  the  rows  of  the  contingency  table 
represent  the  true  reference  classification.  The  "conditional 
kappa"  (ky)  is  a  coefficient  of  agreement  given  that  the 
row  classification  is  category  i  (Bishop  et  al.  1975,  p.  397): 

_  Pa  ~  Pi  P  i 


Pi     ~  Pi-P, 


[56] 


The  Taylor  series  approximation  of  Eq.  56  is  made 
using  Eq.  10: 

(  dk:  ^ 


+£ 


71 


\dpnJ 

+l  +eJtM 


pa=Pij 

\dPm-D  j 


Pa=p.j 


+  £7(7+l) 


+  L  +  £ 


\dPik  J 


P,i=Pn 


+  L  +%-!); 


Pn=Pii 


3P(i-U; 


Apn=Pii 


+  £ 


(7+1)7 


\  °P(7+1)/  J\ 


+  L  +£kj 


f  k:  ^ 


ydpki) 


Pn=Pi, 


[57] 
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p..  is  factored  out  of  Eq.  56  to  compute  the  partial  de- 
rivatives in  Eq.  57.  First,  define 


h-\u  =  ^Pij  =Pi~PiW 


7=1 
j*u 


[58] 


Q>\u  =  ^Pji=Pi-PV 


7=1 


[59] 


Substituting  Eqs.  58  and  59  into  Eq.  56,  Kf  can  be  re- 
written as  a  function  of  pu  and  differentiated  for  the 
first  term  of  the  Taylor  series  approximation  in  Eq.  57: 


k;  = 


dPii 


Pi,--fe|y+P,i)(Q.fli+A/) 

fai)i+Pii)-iai\i+Pn)i<*.ty+Pu) 
(-0,1,0,-1, )  +  (1  -  Q.,|,  -  Qj.|j )  p„  -  pi 

(a,.„  -  a,.„a,|, )  +  (1  -  aiU  -  a, , )  p„  -  pj 


(Pi  -PiP ,-)[(!-  f-^J-ZPii 
-  (Pi,  "  P/P, )  f(l  -  a.iii  -  % )  -  2pyi 


[60] 


iPi.-Pi.pJ 
(p,-p„)(i-p, -p,) 

(Py.  -Py.P/)2 


[61] 


Similarly,  tc,.  can  be  rewritten  as  functions  of  p..  or  p-v 
i  *  j,  then  differentiated  for  the  other  terms  in  the  Tay- 
lor series  approximation  in  Eq.  57: 


K"; 


Pn-iaiv+Pi,)Pi 


tei\j+Pij)-{ai\j+Pi})Pi 


[62] 


dp  >i 


lPi  ~ PiPift-Pil 

-iPn-Pi.P.i)\l-Pil 


-Piitl-Pi) 
iPi-PiPif  Pi  ~Pi  Pi) 


2  ' 


[63] 


Pii- Pi-  (Q,];+P;;  ) 

Pi  ~Pi-  Kly+PyJ' 


K-  = 


[64] 


dp  i> 


Pi-Pi.P.i)[-Pi.] 

<Pu  -  PiPA-Pi 


^  -iPi .-Pu) Pi. 
iPi-PiPif  (Pi-PiPi)2 


[65] 


Replacing  the  partial  derivatives  in  Eqs.  61,  63,  and 
65  which  are  evaluated  at  pi}  =  pif ,  into  the  Taylor  se- 
ries approximation  in  Eq.  57: 


Pa^-Pi) 


(p,. -pjfl-p,  -p;) 
(Pi-Pi-P,)2 

ip,~Pu)p,  V 


(P,  "  PiP  if  iPi.  ~  Pi  Pi)2 


£ 

2         I  P 


[66] 


[K,-K,f 


(Pi  -  Pii)'  (1- Pi  -  Pi)2  C2 
(Pi-PiP-i)4 


,  i- (i-p,) 


2      k  k 


(Pi  -p.Pi)  tf 


;  #j  s*i 


(P,-P,P:),fi  "tf 

/*i  s*i 


(p,.  -  p„  )  (1  -  p,  -  p ,  )  p„  (1  -  p., ) 


(Pi-~  Pi-Pi)" 


{l-Pi--Pi)(P,  -  Pii)'  Pi- 
(Pi  -P/  P,  ) 


jc 

Jc 

X* 

7=1 

s=l 

f 

\ 

7c 

Jc 

7=1 

s=l 

^  /** 

S#l  i 

+ 


+ 


p,,(i-p,Kp,.  -  pu)  pj. 
[Pi-PiPiT 


Patt-PiKPi  .-Pa)  h 
(Pi  -  PiP  if 


k  k 

;=1  s=l 
/*i  s*i 


1=1  s=l 
;>7  s*i 


[67] 
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[67]  continued  .  (p,.  -  p.  )(l  -  p.  -  p,.) 

Zu  {i     rA  _  A  A  l2  Zu  fi ' 


\2 


j*i  s*i 


Pi f-U     P;J    ;;=i  s=i  IPi-PiPi) 


Equation  69  is  used  to  derive  VarOcJ  similar  to  Eq.  67: 


-2 


(p,  -pii){l-pi.-p.i)Pii 

A-(1-P,)3 


2* 


k  k 


)*l  s*i 


[i-p,.  -p.j)[p., -pi,) 
P-d-p,)4 


2  * 


The  validity  of  the  approximation  in  Eq.  67  was  par- 
tially checked  using  the  example  provided  by  Bishop 
et  al.  (1975,  p.  398);  the  results  are  in  table  3.  For  ex- 
ample, k3.  is  0.2941  in  table  3,  which  agrees  with  k3  in 
Bishop  et  al.  The  95%  confidence  interval  is  0.2941± 
(1.96  x  ^0.0122  )  in  table  3,  which  agrees  with  the  inter- 
val [0.078,  0.510]  in  Bishop  et  al. 

The  variance  under  the  null  hypothesis  of  independence 
between  the  row  and  column  classifiers,  Var0(Ky),  as- 
sumes that  pu  -  Pi  Pi  -  To  compute  Var0  [k1 .),  substitute 
Pii  =  P/ Pi  int°  Eq.  67,  and  use  the  variance  under  the 
assumption  E[p^]  =  Pi.p.j  (see  Eqs.  113,  114,  and  117). 
An  example  of  Var0  [k;  )  is  given  in  table  3. 


Conditional  Kappa  (     )  for  Column  i 

The  kappa  conditioned  on  the  ith  column  rather  than 
the  ith  row  (Eq.  56)  is  defined  as: 


K 


Pii  ~  Pi  P  i 
P  i  ~  Pi  P  i 


[68] 


The  Taylor  series  approximation  of  Eq.  68  is  derived 
similar  to  Eq.  66: 


-2 


[p.i-pii){l-Pi.-p.i)pii 
P-(l-A)3 


7=1 


4^xx«v... 

j±i  s±i 


[70] 


The  variance  under  the  null  hypothesis  of  indepen- 
dence between  the  row  and  column  classifiers, 
Var0  (£,.),  assumes  that  pj7  =  py  p; .  To  compute 
Var0  (£,),  substitute  pu  =  p,  p4  into  Eq.  70,  and  use  the 
variance  under  the  assumption  Ii[p.  ]  =  P/.p.,  (see  Eqs. 
113,  114,  and  117).  The  validity  of  this  approximation 
was  partially  checked  by  using  p'  in  the  example  pro- 
vided by  Bishop  et  al.  (1975);  the  results  are  in  table  4. 


Matrix  Formulation  of  Var(?cJ )  and  VarOcJ 

The  formulae  above  can  be  expressed  in  matrix  alge- 
bra, which  facilitates  numerical  implementation  with 
matrix  algebra  software.  The  py  and  py  terms  that  de- 
fine iti  in  Eq.  56  are  computed  with  Eqs.  2  and  3  or 
matrix  Eqs.  35  and  36.  The  linear  approximation  of 
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Var  (£,-.)  in  Eq.  67  uses  the  following  terms.  First,  de- 
fine the  diagonal  kxk  matrix  using  the  definitions 
of  p;  and  p.j  in  Eqs.  2  and  3,  in  which  all  elements 
equal  zero  except  for  the  diagonal: 

m*M^&        '  1711 

which  corresponds  to  the  second  term  in  Eq.  66.  An 
example  of  H,-.  is  given  in  table  3.  Define  the  kxk  ma- 
trix My.,  in  which  all  elements  equal  zero  except  for 
the  ith  column: 


(^M'is/s,,;  ™ 

which  corresponds  to  the  third  term  in  Eq.  66.  An  ex- 
ample of  M;  is  given  in  table  3.  Define  the  kxk  matrix 
Gy  ,  in  which  all  elements  are  zero  except  for  the  iith 
element: 

1 

[Gl)tt  =  p,n-p,Y  [73] 

which  corresponds  to  the  first  term  in  Eq.  66  plus 
abs[H-  )u  in  Eq.  71  plus  abs{M-  )jj  in  Eq.  72.  An  example 
of  Gy  is  given  in  table  3. 


Table  3.  —  Example  data1  from  Bishop  et  al.  (1975)  for  conditional  kappa  ( £.),  conditioned  on  the  row  classifier  (/), 
including  vectors  used  in  matrix  formulation.  Contingency  table  is  given  in  table  2. 


Vectors  used  in  matrix  computations 


G,(Eq.73) 

H,  (Eq.  71) 

M,  (Eq.  72) 

7=1 

7=2 

7=3 

7=1 

7=2 

7=3 

7=1 

7=2 

7=3 

4.4690 

0 

0 

-2.6197 

0 

0 

-1.3407 

0 

0 

0 

0 

0 

0 

-4.0614 

0 

-1.3407 

0 

0 

0 

0 

0 

0 

0 

-1.9548 

-1.3407 

0 

0 

0 

0 

0 

-2.6197 

0 

0 

0 

-0.5428 

0 

u 

O.  /  DOO 

A 

u 

0 

-4.0614 

A 
U 

A 
U 

n  RAO& 

—v.Ohzo 

A 

u 

0 

0 

0 

0 

0 

-1.9548 

0 

-0.5428 

0 

0 

0 

0 

-2.6197 

0 

0 

0 

0 

-0.9965 

o 

o 

o 

0 

-4.0614 

o 

o 

0 

-0  9965 

0 

0 

3.9095 

0 

0 

-1 .9548 

0 

0 

-0.9965 

Vectors  used  in  matrix  computations,  null 

hypothesis  k,. 

=  0 

G,  (Eq.  73) 

H,  (Eq.71,  p„  =p,p,) 

M, 

(Eq.  72,  pit  = 

PiPi) 

7=1 

7=2 

7=3 

7=1 

7=2 

7=3 

7=1 

7=2 

7=3 

4.4690 

0 

0 

-1 .9862 

0 

0 

-1.8000 

0 

0 

0 

0 

0 

0 

-1.5183 

0 

-1.8000 

0 

0 

0 

0 

0 

0 

0 

-1.1403 

-1.8000 

0 

0 

0 

0 

0 

-1.9862 

0 

0 

0 

-1.3585 

0 

0 

5.7536 

0 

0 

-1.5183 

0 

0 

-1.3585 

0 

0 

0 

0 

0 

0 

-1.1403 

0 

-1.3585 

0 

0 

0 

0 

-1.9862 

0 

0 

0 

0 

-1.4118 

0 

0 

0 

0 

-1.5183 

0 

0 

0 

-1.4118 

0 

0 

3.9095 

0 

0 

-1.1403 

0 

0 

-1.4118 

Resulting  statistics 

Cov{kj.) 

Cov(kj.) 

;' 

7=1 

7=2 

7=3 

7=1 

7=2 

7=3 

1 

0.2551 

0.0170 

0.0052 

0.0067 

0.0165 

0.0040 

0.0046 

2 

0.6004 

0.0052 

0.0191 

0.0000 

0.0040 

0.0161 

0.0021 

3 

0.2941 

0.0067 

0.0000 

0.0122 

0.0046 

0.0021 

0.0101 

'  The  covariance  matrix  for  the  estimated  joint  probabilities  ( pH )  is  estimated  assuming  the  multinomial  distribution  (see  Eqs.  46,  47,  128,  130,  131, 

and  132). 
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Cov(£. )  =  d'kCov[vecP)  dk  .  [75] 

See  Eqs.  104  and  105  for  examples  of  Cov(vecP).  An 
example  of  Cov(k^)  is  given  in  table  3.  The  estimated 
variance  foreach  equals  the  corresponding  diagonal 
element  of  Cov[Kt )  in  Eq.  75.  As  in  the  case  of  kw,  bet- 
ter approximations  of  injcv  ~  [vecV)'dk.  might  lead 
to  better  approximations'  of  Cov  [k-  ) . 

To  compute  the  covariance  matrix  under  the  null 
hypothesis  of  independence  between  the  row  and  col- 
umn classifiers,  Cov0(£.),  substitute  pH  =  p^p^m  Eqs. 
71,  72,  74  and  75;  and  use  the  variance  under  the  as- 


Table  4.  —  Example  data1  from  Bishop  et  al.  (1975)  for  conditional  kappa  (  k.j),  conditioned  on  the  column  classifier, 
including  vectors  used  in  matrix  formulation.  Contingency  table  is  given  in  table  . 

Vectors  used  in  matrix  computations 


G,  (Eq.  78)  H,  (Eq.  76)  M,  (Eq.  77) 


7=2 

y=3 

7=1 

/=2 

j=3 

7=1 

j=2 

7=3 

3.7674 

0 

0 

-1.3142 

0 

0 

-2.0015 

0 

0 

0 

0 

0 

0 

-0.6314 

0 

-2.0015 

0 

0 

0 

0 

0 

0 

0 

-0.9333 

-2.0015 

0 

0 

0 

0 

0 

-1.3142 

0 

0 

0 

-3.1331 

0 

0 

4.9608 

0 

0 

-0.6314 

0 

0 

-3.1331 

0 

0 

0 

0 

0 

0 

-0.9333 

0 

-3.1331 

0 

0 

0 

0 

-1.3142 

0 

0 

0 

0 

-3.3221 

0 

0 

0 

0 

-0.6314 

0 

0 

0 

-3.3221 

0 

0 

5.3665 

0 

0 

-0.9333 

0 

0 

-3.3221 

Vectors  used  in  matrix  computations,  null  hypothesis  k\ 

=0 

G,  (Eq.  73) 

H,  (Eq.71,  pu  =  p,p,) 

(Eq.  72,  p..  =  p.p., 

) 

7=1 

y=2 

y=3 

7=1 

7=2 

/=3 

/=1 

7=2 

y=3 

3.7674 

0 

0 

-1 .6744 

0 

0 

-1.5174 

0 

0 

0 

0 

0 

0 

-1.3091 

0 

-1.5174 

0 

0 

0 

0 

0 

0 

0 

-1.5652 

-1.5174 

0 

0 

0 

0 

0 

-1 .6744 

0 

0 

0 

-1.1713 

0 

0 

4.9608 

0 

0 

-1.3091 

0 

0 

-1.1713 

0 

0 

0 

0 

0 

0 

-1 .5652 

0 

-1.1713 

0 

0 

0 

0 

-1 .6744 

0 

0 

0 

0 

-1.9379 

0 

0 

0 

0 

-1.3091 

0 

0 

0 

-1.9379 

0 

0 

5.3665 

0 

0 

-1 .5652 

0 

0 

-1.9379 

Resulting  statistics 

Cov(k,) 

Cov0(kh) 

/ 

7=1 

y=2 

7=3 

7=1 

/=2 

7=3 

1 

0.2151 

0.0124 

0.0033 

0.0079 

0.0117 

0.0029 

0.0053 

2 

0.5177 

0.0033 

0.0166 

0.0010 

0.0029 

0.0120 

0.0024 

3 

0.4037 

0.0079 

0.0010 

0.0207 

0.0053 

0.0024 

0.0191 

'  Tfte  covariance  matrix  for  the  estimated  joint  probabilities  ( pa )  is  estimated  assuming  the  multinomial  distribution  (see  Eqs.  46,  47,  128,  130,  131, 
and  132). 


The  linear  approximation  of  *cf  equals  {vecV)'d.k.  , 
where  the  k2xk  matrix  equals: 


G2 

+ 

M2. 

+ 

M 

M 

M 

[74] 


An  example  of  is  given  in  table  3.  The  kxk  covari- 
ance matrix  for  the  kxl  vector  of  conditional  kappa  sta- 
tistics Ki  equals: 
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sumption  E[p}j]  =  ptp.  (see  Eqs.  113, 114,  and  117).  An 
example  of  Cov0 [k-]  is  given  in  table  3. 

The  linear  approximation  of  Var  [k .•)  in  Eq.  70,  which 
is  used  to  estimate  the  precision  of  the  kappa  condi- 
tioned on  the  column  classifier  (£,,),  can  also  be  ex- 
pressed in  matrix  algebra.  As  in  Eq.  71,  define  the  di- 
agonal kxk  matrix  H  , ,  in  which  all  elements  equal  zero 
except  for  the  diagonal: 


iH,h=- 


(Pi-Pa) 
P,(l-A)2 


[76] 


which  corresponds  to  the  second  term  in  Eq.  70.  An 
example  of  H  y  is  given  in  table  4.  As  in  Eq.  72,  define 
the  kxk  matrix  M.y ,  in  which  all  elements  equal  zero 
except  for  the  ith  column: 


partial  kappa  statistics  from  the  same  error  matrix  (P). 
The  variance  of  this  difference  is  used  to  test  the  hy- 
pothesis that  the  difference  between  kv  and  k2.  is  zero, 
i.e.,  the  conditional  kappas  are  the  same,  and  hence, 
the  accuracy  in  classifying  objects  into  categories  i=l 
and  i=2  is  the  same.  Table  3  provides  an  example.  The 
difference  between  k1.  and  k2.  is  (0.2551-0.6004)  = 
-0.3453;  the  variance  of  this  estimated  difference  is 
0.0170+0.0191+(2x0.0052)  =  0.0465;  the  standard  devia- 
tion is  0.2156;  and  the  95%  confidence  interval  is 
-0.3453  ±  (1.96x0.2156)  =  [-0.7679,0.0773].  Since  this 
interval  contains  zero,  we  fail  to  reject  (at  the  95%  level) 
the  null  hypothesis  that  the  two  classifiers  have  the  same 
agreement  when  the  row  classification  is  category  i=l 
or  i=2.  This  test  might  have  limited  power  to  detect  true 
differences  in  accuracy  for  specific  categories. 


Pa 


,l<j<n, 


[77] 


which  corresponds  to  the  third  term  in  Eq.  70.  An  ex- 
ample of  M.j  is  given  in  table  4.  As  in  Eq.  73,  define  the 
kxk  matrix  G.f,  in  which  all  elements  are  zero  except 
for  the  j'ith  element: 


(G,),= 


Pi^-Pi.y 


[78] 


which  corresponds  to  the  first  term  in  Eq.  70  plus 
abs[H t) H  in  Eq.  76  plus  abs{M-  )u  in  Eq.  77.  An  example 
of  G  4  is  given  in  table  4. 

The  linear  approximation  of  Kmi  equals  [vecP)'dk  , 
where  the  k2xk  matrix  d^.  equals: 


cL  = 


H, 

G2 

M.2 

+ 

+ 

M 

M 

M 

M.3 

[79] 


CONDITIONAL  PROBABILITIES 

Fleiss  (1981,  p.  214)  gives  an  example  of  assessing 
classification  accuracy  for  individual  categories  with 
conditional  probabilities.  An  example  of  a  conditional 
probability  is  the  probability  of  correctly  classifying  a 
member  of  the  population  (e.g.,  a  pixel)  as  forest  given 
that  the  pixel  is  classified  as  forest  with  remote  sens- 
ing. Let  pjyi  yj  represent  the  conditional  probability  that 
the  row  classification  is  category  i  given  that  the  col- 
umn classification  is  category  /;  in  this  case: 


Pu\n 


El 
Pi 


[81] 


The  variance  for  an  estimate  of  p(ii ,j]  can  be  approxi- 
mated with  the  Taylor  series  expansion  as  in  Eq.  57. 
First,  p.  is  factored  out  of  Eq.  81  using  Eq.  59  so  that  the 
partial  derivatives  can  be  computed: 


Pu[n  = 


Pa 


ai\r+Pr 


,l<r<k. 


[82] 


An  example  of  is  given  in  table  4.  The  kxk  covari- 
ance  matrix  for  the  kxl  vector  of  conditional  kappa  sta- 
tistics kj  equals: 


Cov[k .)  =  d;  (Cov(vecP)) 


[80] 


See  Eqs.  104  and  105  for  examples  of  Cov(vecP).  An 
example  of  Cov(rt)  is  given  in  table  4.  The  estimated 
variance  for  each  k,t  equals  the  corresponding  diagonal 
element  of  Cov(^  J)  in  Eq.  80.  To  compute  the  covari- 
ance  matrix  under  the  null  hypothesis  of  independence 
between  the  row  and  column  classifiers,  Covc  (£,),  sub- 
stitute pa  =  PiP;  into  Eqs.  76,  77,  78,  79,  and  80;  and 
use  the  variance  under  the  assumption  E[pij\  =  P;.p.i 
(see  Eqs.  113,  114,  and  117).  An  example  of  Cov0(k-,j 
is  given  in  table  4. 

The  off-diagonal  elements  of  Cov(£.)  and  Cov(?cJ) 
can  be  used  to  estimate  precision  of  differences  between 


The  partial  derivative  of  Eq.  82  with  respect  to  p^  is: 

_    Pj  "Ay  _Pj-Pij 


dPv\n 

dPrj 


\Pn=Pn 


tijU+Pijf  P2i 


,r  =  i, 


[83] 


Pi 


\Prj=Prj 


-=-E± 

^■ilr+Prj)2  P2j 


,r*i, 


The  Taylor  series  approximation  of  Var  ( p(j| ;))  is  made 
similar  to  Eq.  57  for  Var(£.): 


'PU\j) 


Pj-Pij 


\Pri=Prj 


P2I 


+  >  e. 


r=l 
r*i 


V  Pi  J 

[84] 
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( 


'Pu\i) 


Pi -Pi. 

;  2 
Pi 


.Ely 
p2  ^ 

f}  r= 


1 
r*i 


P; 


£  . 

y  j  s=\ 


Matrix  Formulation  for  Varfp^.J  and  Vartp^.) 

Equation  88  can  be  expressed  in  matrix  algebra  as 
follows.  First,  define  the  kxk  matrix  H  («,  in  which  all 
elements  equal  zero  except  for  the  ith  column: 


Var(p,|,)  = 
(P.;  ~Ay) 


.4  2^,-2^#i 
Pi  Pi 

r*i 

~2     k  k 


X*[e./£,] 


+ffxx^>- 


P  j   r=l  s=l 
r*i  s*i 


[85] 


Conditional  probabilities  that  are  conditioned  on  the 
row  classification,  rather  than  the  column  classification, 
are  also  useful.  Let  p(i|;.0  represent  the  conditional  prob- 
ability that  the  column  classification  is  category  i  given 
that  the  row  classification  is  category  /;  in  this  case: 

Pfi  r  ! 

Pu\n=J-'  [86] 

The  variance  for  an  estimate  of  p^U)  can  be  approximated 
with  the  Taylor  series  expansion  as  in  Eqs.  82  to  85: 

Var(p/|;,)  = 

Pj  Pi 


r=l 


f)2     k  k 

+fXX*w 

Pi   r=i  s=l 
r*i  s*i 


[87] 


Of  special  interest  is  the  case  in  which  i  =  /,  i.e.,  the 
conditional  probabilities  on  the  diagonal  of  the  error 
matrix  ( P).  In  this  case, 


Var(pf|,)  = 
iPi-Piif 


£.4 
Pi 


E[e2a]-2 


(Pi-Pii)Pii 


Pi 


r=l 
r*i 


2  k  k 
Pi    r=l  s=l 


[88] 


(Hp(,,)ri.=£f-,l<r<i:. 
P, 


[90] 


Equation  90  corresponds  to  the  second  term  in  Eq.  84, 
and  an  example  is  given  in  table  5.  Define  the  kxk  ma- 
trix Gp(  /r ,  in  which  all  elements  are  zero  except  for  the 
iith  element: 


Pi 


[91] 


Equation  91  corresponds  to  the  first  term  in  Eq.  84,  and 
an  example  is  given  in  table  5.  The  linear  approxima- 
tion of  p(ji  ,  equals  (vecP)'D(jiir ,  where  is  the  kxl 
vector  of  diagonal  conditional  probabilities  with  its  ith 
element  equal  to  p^.  The  kzxk  matrix  D(ii  y)  equals: 


Gp(l)n 

HP(U 

Gp(-2) 

+ 

Hp(-2) 

M 

M 

/*/>(■*)_ 

[92] 


An  example  of  Dtfi«  (Eq.  92)  is  given  in  table  5.  The  kxk 
covariance  matrix  for  the  kxl  vector  of  estimated  con- 
ditional probabilities  p[j^i]  on  the  diagonal  of  the  error 
matrix  (conditioned  on  the  column  classification) 
equals: 


Cov(pul,)  =  D',,Cov(yecP)  D(,,. 


[93] 


See  Eqs.  104  and  105  for  examples  of  Cov(vecP).  An 
example  of  Eq.  93  is  given  in  table  5. 

The  variance  of  the  estimated  conditional  probabili- 
ties that  are  conditioned  on  the  column  classifications 
( p(j.i;..j  in  Eq.  89)  can  similarly  be  expressed  in  matrix 
form.  First,  define  the  kxk  diagonal  matrix  H  (i.r,  in 
which  all  elements  equal  zero  except  for  the  diagonal 
elements  (iz): 


Pi- 


[94] 


Var(pi|l,)  = 

*2 


Pi-  Pi- 


r=l 
r*i 


*  2     k  k 

+fXX^- 

Pi    r=l  s=l 


[89] 


An  example  is  given  in  table  5.  Define  the  kxk  matrix 
G  ,  in  which  all  elements  are  zero  except  for  the  iith 
element: 

1  pU))ii     p,'  [95] 

An  example  is  given  in  table  5.  The  linear  approxima- 
tion of  pyj^  equals  [vecPYD^,  where  ptJuj  is  the  kxl 
vector  of  diagonal  probabilities  conditioned  on  the  row 
classification.  The  k2xk  matrix  D(j.|(,,  equals: 
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Gp(l) 

HP(1) 

+ 

TT 

M 

M 

[96] 


An  example  is  given  in  table  5.  The  kxk  covariance 
matrix  for  the  kxl  vector  of  estimated  conditional  prob- 
abilities (ppio)  on  the  diagonal  of  the  error  matrix  (con- 
ditioned on  the  row  classification)  equals: 


[97] 

See  Eqs.  104  and  105  for  examples  of  Cov(vecP).  An 
example  of  Eq.  97  is  given  in  table  5. 

Test  for  Conditional  Probabilities 
Greater  Than  Chance 

It  is  possible  to  test  whether  an  observed  conditional 
probability  is  greater  than  that  expected  by  chance.  In 


Cov(p(y|,))  =  D;/|,)  (Cov(vecP))  D(i 


Table  5.  —  Examples  of  conditional  probabilities1  and  intermediate  matrices  using  contingency  table  in  table  2. 


Conditional  probabilities,  conditioned  on  columns  ( P(/|./)) 


Approximate  95% 


/' 

Pa 

Pi 

P{i\i) 

(Eq.  93) 

confidence  bounds 

1 

0.2361 

0.4444 

0.5313 

0.0078 

[0.3583,  0.7042] 

2 

0.1667 

0.2639 

0.6316 

0.0122 

[0.4147,  0.8485] 

3 

0.1806 

0.2917 

0.6190 

0.0112 

[0.4113,  0.8268] 

Gp(  /)  (Eq.  91) 

Hp(./)  (Eq.  90) 

Om  (Eq.  92) 

/=1 

/=2 

/=3 

/=1 

/=2 

j=3 

y=i 

/=2 

/=3 

2.2500 

0 

0 

-1.1953 

0 

0 

1 .0547 

0 

0 

0 

0 

0 

-1.1953 

0 

0 

-1.1953 

0 

0 

0 

0 

0 

-1.1953 

0 

0 

-1.1953 

0 

0 

0 

0 

0 

0 

-2.3934 

0 

0 

-2.3934 

0 

0 

3.7895 

0 

0 

-2.3934 

0 

0 

1.3961 

0 

0 

0 

0 

0 

-2.3934 

0 

0 

-2.3934 

0 

0 

0 

0 

0 

0 

-2.1224 

0 

0 

-2.1224 

0 

0 

0 

0 

0 

-2.1224 

0 

0 

-2.1224 

0 

0 

3.4286 

0 

0 

-2.1224 

0 

0 

1.3062 

Conditional  probabilities,  conditioned  on  rows  (  p(l\n) 


d/ag[Cov(p(,1/  ))]  Approximate  95% 

/  p„  p,  P(,|,)  (Eq.  97)  confidence  bounds 


1 

0.2361 

0.4028 

0.5862 

0.0084 

[0.4070,  0.7655] 

2 

0.1667 

0.2361 

0.7059 

0.0122 

[0.4893,  0.9225] 

3 

0.1806 

0.3611 

0.5000 

0.0096 

[0.3078,  0.6922] 

Gp(/ )  (Eq.  95) 

Hp(  /}  (Eq.  94) 

Dm  (Eq.  96) 

1=2 

y=3 

7=2 

j=3 

J=2 

/=3 

2.4828 

0 

0 

-1 .4554 

0 

0 

1 .0273 

0 

0 

0 

0 

0 

0 

-2.9896 

0 

0 

-2.9896 

0 

0 

0 

0 

0 

0 

-1.3846 

0 

0 

-1 .3846 

0 

0 

0 

-1.4554 

0 

0 

-1.4554 

0 

0 

0 

4.2353 

0 

0 

-2.9896 

0 

0 

1 .2457 

0 

0 

0 

0 

0 

0 

-1.3846 

0 

0 

-1.3846 

0 

0 

0 

-1 .4554 

0 

0 

-1.4554 

0 

0 

0 

0 

0 

0 

-2.9896 

0 

0 

-2.9896 

0 

0 

0 

2.7692 

0 

0 

-1 .3846 

0 

0 

1 .3846 

1The  covariance  matrix  for  the  estimated  joint  probabilities  (fy )  is  estimated  assuming  the  multinomial  distribution  (see  Eqs.  46,  47,  128,  130,  131, 
and  132). 


16 


most  cases,  practical  interest  is  confined  to  the  condi- 
tional probabilities  on  the  diagonal  ( p{l^}]  or  p^J.  This 
is  closely  related  to  the  hypothesis  that  the  conditional 
kappa  ( K.i  or  k-  )  is  no  greater  than  that  expected  by  chance 
(see  Var0  (£f.)  and  Var0  (tc.J  following  Eqs.  67  and  70). 

The  proposed  test  is  based  on  the  null  hypothesis  that 
the  difference  between  an  observed  conditional  prob- 
ability and  its  corresponding  conditional  probability 
expected  under  independence  between  the  row  and 
column  classifiers  is  not  greater  than  zero.  First,  con- 
sider the  conditional  probability  on  the  diagonal  of  the 
ith  row  that  is  expected  if  classifiers  are  independent: 


Pi 


[98] 


in  Eq.  98  is  defined  in  Eq.  2,  but  the  Taylor  series 
approximation  of  p;-  can  be  expressed  differently  in 
matrix  algebra  as: 

p;.  =  veer  D, .  [99] 

Recall  that  py  in  Eq.  99  is  the  ^xl^vector  in  which  the 
ith  element  is  py.  (Eq.  35),  and  vecV'  is  the  transpose  of 
the  k2xl  vector  version  of  the  kxk  error  matrix  P(Eqs. 
35  and  36).  In  Eq.  99,  D;  is  a  k2xk  matrix  of  zeros  and 
ones,  where  D-  =  (I  I  L  1 1)'  and  I  is  the  kxk  identity 
matrix.  Let  the  2kxl  vector  p  •_  equal  (p(i-|.;)|p' )',  where 
Pyi.fl  is  the  kxl  vector  of  observed  conditional  probabili- 
ties (Eq.  92)  and  py  is  the  kxl  vector  of  expected  condi- 
tional probabilities  under  the  independence  hypothesis 
(Eq.  99).  The  covariance  matrix  for  p ;_  using  the  Taylor 
series  approximation  is: 


Cov0  (p,)  =  D'i_  Cov0(vecP) 


D 


100] 


where  D ;_  is  the  k2x2k  matrix  equal  to  [D^.JDJ , 
is  defined  in  Eq.  92,  and  D-  is  defined  following  Eq. 
99.  An  example  of  D  2_  is  given  in  table  6.  The  covariance 
matrix  expected  under  the  null  hypothesis,  CovG  (vecP), 
is  used  in  Eq.  100  (see  Eqs.  113,  114,  and  117). 

The  Taylor  series  approximation  of  the  kxl  vector  of 
differences  between  the  observed  conditional  probabili- 
ties on  the  diagonal  (p(ii .})  and  their  expected  values 
under  the  independence  hypothesis  (py  )  equals 
p'y_ [l|— I]' ,  where  [l|-I]'  is  a  2icxjc  matrix  of  ones  and  ze- 
ros (table  6),  and  I  is  the  kxk  identity  matrix.  Since  this 
represents  a  simple  linear  transformation,  the  Taylor 
series  approximation  of  its  covariance  matrix  is: 


Cov0  (p(j,f)-p,)  =  [l|-I]  Covjp.,.) 


101] 


Equation  101  can  be  combined  with  Eq.  100  for 
Cov0(p.J_)  to  make  the  expression  more  succinct  with 
respect  to  Cov0  (vecP): 


CoVo  (Pfili)-Pj 


Cov0  [vecP] 


D 


[102] 


where  D(l|iH.  D(jj.iH.  =  D. ,  -  [I  I  -I]'  in  Eq.  102.  An  ex- 
ample of  D(j|  jH  is  given  in  table  6. 


A  similar  test  can  be  constructed  for  the  diagonal  prob- 
abilities conditioned  on  the  row  classification,  in  which 
the  null  hypothesis  is  independence  between  classifi- 
ers given  the  row  classification  is  category  i,  i.e., 
#[Pl/k)]  =  p.,  (see  Eq.  98).  Define  D  2,  as  a  k2xk  matrix  of 
zeros  and  ones  defined  as  follows.  Let  D  a  be  the  kxk 
matrix  with  ones  in  the  first  column  and  zeros  in  all 
other  elements,  let  D  2  be  the  kxk  matrix  with  ones  in 
the  second  column  and  zeros  in  all  other  elements,  and 


L  D',  )'.  As  inEq.  100, 


as  the  k2x2k  matrix  equal  to  [D^.jI  D./:J , 
'(jj.)  is  given  in  Eq.  96.  The  approximate  covari- 


so  forth;  then,  D _h  equals  (D'3 
define  DJ 
where  D( 

ance  matrix  for  the  kxl  vector  of  differences  between 
the  observed  and  expected  conditional  probabilities  is 
derived  as  in  Eq.  102: 


CovG  (p(l1l. ,  -  p  )  =  D'.. ,  i  Cov0  (vecP) 


D 


Uli-)- 


[103] 


where  D, 


=  D 


[iu.y.i  —  is.jAlr-il' •  The  covariance  matrix  ex- 
pected under  the  null  hypothesis,  Cov0  (vecP),  is  used 
in  Eq.  103  (see  Eqs.  113,  114,  and  117).  An  example  of 
Eq.  103  is  given  in  table  6. 

The  variances  on  the  diagonal  of  Cov0(p(^./)  -pj  in 
Eqs.  101  or  102,  Cov0(p(i|/.)  -p  j  in  Eq.  103,  can  be  used 
to  estimate  an  approximate  probability  of  the  null  hy- 
pothesis being  true.  It  is  assumed  that  the  distribution 
of  random  errors  is  normal  in  the  estimate  of  (p^.j)  ) 
or  (Pu|y.)-P,).  and  Covjp^-p,)  and  Cov^p,^ -pj 
are  accurate  estimates.  A  one-tail  test  is  used  because 
practical  interest  is  confined  to  testing  whether  the  ob- 
served conditional  probabilities  are  greater  than  those 
expected  by  chance.  An  example  of  these  tests  is  given 
in  table  6.  Tests  on  conditional  probabilities  might  be 
more  powerful  than  tests  with  the  conditional  kappa 
statistics  because  the  covariance  matrix  for  conditional 
probabilities  use  fewer  estimates.  This  will  be  tested 
with  Monte  Carlo  simulations  in  the  future. 

Examples  given  by  Green  et  al.  (1993)  were  used  to  par- 
tially validate  the  variance  approximations  in  Eqs.  93  and 
97.  This  validation  is  limited  by  its  empirical  nature  and 
use  of  the  multinomial  distribution  for  stratified  sampling. 


COVARIANCE  MATRICES  FOR  Eie^eJ  AND  vecV 

Estimated  variances  of  accuracy  assessment  statistics 
require  estimates  of  the  covariances  of  random  errors 
between  estimated  cells  in  the  contingency  table  (p). 
These  are  denoted  £[£,,£1  for  the  covariance  between 
cells  and  {r,s}  in  p,  or  Cov  (vecP)  for  the  k2xk2  covar- 
iance matrix  of  all  covariances  associated  with  the  vec- 
tor version  of  the  estimated  contingency  table  (vecP). 
Key  examples  of  the  need  for  these  covariance  estimates 
are  in  Eqs.  25  and  43  for  the  weighted  kappa  statistic 
( Kw)',  Eq.  33  for  the  unweighted  kappa  statistic  ( ic);  Eqs. 
67,  70,  75,  and  80  for  the  conditional  kappa  statistics 
[Ki  and  ffj;  Eqs.  85,  87,  88,  89,  93,  and  97  for  condi- 
tional probabilities  (p^  and  p^  );  and  Eqs.  101,  102, 
and  103  for  differences  between  diagonal  conditional 
probabilities  and  their  expected  values  under  the  inde- 
pendence assumption. 
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Table  6.  — 


Examples  of  tests  with  conditional  probabilities1  and  intermediate  matrices  using  contingency  table  in  table  2 

and  conditional  probabilities  in  table  5. 


Conditional  probabilities,  conditioned  on  columns  p, 

d/ap[Cov(p//|./)-p/.)] 
P(/l/)  (Eqs.  101  and  102) 


/■ 

Pli 

Pi 

Pm 

Pi 

~Pi 

Variance2 

Std.  Dev. 

z-value3 

■  A 

p-value4 

1 

0.2361 

0.4444 

0.5313 

0.4028 

0.1285 

0.0045 

0.0668 

1.9231 

0.0272 

2 

0.1667 

0.2639 

0.6316 

0.2361 

0.3955 

0.0130 

0.1142 

3.4622 

0.0003 

3 

0.1806 

0.2917 

0.6190 

0.3611 

0.2579 

0.0100 

0.1001 

2.5760 

0.0050 

D./_  =  [D(/|.n  I  D/.] 

□(/(./)_/.  =D./_[I  I  -I]' 

(Eqs.  92,  99,  100) 

(Eqs.  100,  101, 102) 

;  A 
/=' 

7=3 

7=4 

i—ti 
J-O 

i— R 
7-° 

7-' 

f— o 

7-* 

7=3 

1  n^47 

0 

0 

1 

o 

0 

0.0547 

o 

n 

-1.1953 

0 

0 

0 

1 

0 

-1.1953 

-1 

0 

—  i  .1  yoo 

0 

0 

0 

n 

i 
i 

i.i  yoo 

n 
u 

—  I 

0 

-2.3934 

0 

1 

0 

0 

-1 

-2.3934 

0 

0 

1.3961 

0 

0 

1 

0 

0 

0.3961 

0 

u 

-2.3934 

0 

0 

o 

1 

1 

n 

—  I 

0 

0 

-2.1224 

1 

0 

0 

-1 

0 

-2.1224 

0 

0 

-2.1224 

0 

1 

0 

0 

-1 

-2.1224 

u 

0 

1.3061 

0 

o 

1 

o 

U.OUD  1 

[I|-I]'  (Eq.  101) 

7=1 

7=2 

7=3 

1 

0 

0 

0 

1 

0 

0 

0 

1 

-1 

0 

0 

u 

-1 

0 

0 

0 

-1 

Conditional  probabilities,  conditioned  on  rows  (  P(, 

n) 

diag[Cov(p(m-p.j)] 

P(i\h)  (Eq.  103) 

/ 

Pii 

Pi. 

Pm 

Pi 

-Pi 

Variance1 

Std.  Dev. 

z-value 

p-value 

1 

0.2361 

0.4028 

0.5862 

0.4444 

0.1418 

0.0055 

0.0742 

1 .91 17 

0.0280 

2 

0.1667 

0.2361 

0.7059 

0.2639 

0.4420 

0.0175 

0.1323 

3.3405 

0.0004 

3 

0.1806 

0.3611 

0.5000 

0.2917 

0.2083 

0.0061 

0.0784 

2.6580 

0.0039 

D, 

f._=[D//|/aID./] 

D{i\.i)-.i=Dj.-[l\-l]' 

(Eq.  103) 

(Eq.  103) 

y=i 

y=2 

7=3 

7=4 

7=5 

7=6 

7=1 

7=2 

7=3 

1 .0273 

0 

0 

1 

0 

0 

0.0273 

0 

0 

0 

-2.9896 

0 

1 

0 

0 

-1 

-2.9896 

0 

0 

0 

-1.3846 

1 

0 

0 

-1 

0 

-1.3846 

-1 .4554 

0 

0 

0 

1 

0 

-1 .4554 

-1 

0 

0 

1 .2457 

0 

0 

1 

0 

0 

0.2457 

0 

0 

0 

-1 .3846 

0 

1 

0 

0 

-1 

-1 .3846 

-1 .4554 

0 

0 

0 

0 

1 

-1.4554 

0 

-1 

0 

-2.9896 

0 

0 

0 

1 

0 

-2.9896 

-1 

0 

0 

1 .3846 

0 

0 

1 

0 

0 

0.3846 

'  The  covariance  matrix  for  the  estimated  joint  probabilities  ( p, )  is  estimated  assuming  the  multinomial  distribution  (see  Eqs.  46,  47,  128,  130,  131, 
and  132).  ' 

2  The  variance  of  ( p„|,;)_p,J  equals  the  diagonal  elements  of  Cov[(p(,| -  p, )] . 

3  The  z-value  is  the  difference  (p^  ()_p, )  divided  by  its  standard  deviation. 

4  The  p-value  is  the  approximate  probability  that  the  null  hypothesis  is  true.  The  null  hypothesis  is  that  the  observed  conditional  probability  is  not 
greater  than  the  conditional  probability  expected  if  the  two  classifiers  are  independent.  This  is  a  one-tail  test  that  assumes  the  estimation  errors  are 
normally  distributed. 
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The  multinomial  distribution  pertains  to  the  special 
case  of  simple  random  sampling,  in  which  each  sample 
unit  is  independently  classified  into  one  and  only  one 
mutually  exclusive  category  using  each  of  two  classifi- 
ers. Up  until  recently,  variance  estimators  for  accuracy 
assessment  statistics  have  been  developed  only  for  this 
special  case. 

Covariances  for  the  multinomial  distribution  are  given 
in  Eqs.  46  and  47,  where  they  were  used  to  verify  that 
Vai{Kw)  in  Eq.  25  agrees  with  the  results  of  Everitt 
(1968)  and  Fleiss  et  al.  (1969).  These  can  also  be  ex- 
pressed in  matrix  form  as: 


Cov  ( vecV)  =  (1  -  F)  diag  (vecP)  -  vecV  [vecV)' 


In,  [104] 


where  n  is  the  sample  size  of  units  that  are  classified 
into  one  and  only  one  ^category  by  each  of  the  two 
classifiers;  and  diag  [vecV]  is  the  k2xk2  matrix  with  vecY 
on  its  main  diagonal,  with  all  other  elements  equal  to  zero 
(i.e.,  diag[vecP)rr  =  vecPr  for  all  r,  and  diag  (vecP)rs  =  0 
for  all  r^s).  [1-F]  in  Eq.  104  is  the  finite  population  correc- 
tion factor,  which  represents  the  proportional  difference 
between  the  multinominal  and  multivariate  hypergeo- 
metric  distributions.  F  equals  zero  if  sampling  is  with 
replacement  or  really  zero  if  the  population  size  is  large 
relative  to  the  sample  size,  which  is  the  usual  case  in 
remote  sensing  (e.g.,  the  number  of  randomly  selected 
pixels  for  reference  data  is  an  insignificant  proportion 
of  all  classified  pixels).  An  ^example  of  this  type  of  co- 
variance  matrix  is  Cov(vecP)  in  table  2. 

However,  there  are  many  other  types  of  reference  data 
that  do  not  fit  the  multinomial  or  hypergeometric  mod- 
els. Cov(vecP)  might  be  the  following  sample  covari- 
ance  matrix  for  a  simple  random  sample  of  cluster-plots: 


V t[vecYT  -  vecF)[vecPr  -  vecP)' 
Cov(vecP)  =  ^  


71 


11 


[105] 


vecP  = 


—  r=l 


72 


where  n  is  the  sample  size  of  cluster-plots  and  vecPr  is 
the  Fxl  vector  version  of  the  kxk  contingency  table  or 
"error  matrix"  for  the  rth  cluster  plot.  Czaplewski  (1992) 
gives  another  example  of  Cov  (vecP),  in  which  the  mul- 
tivariate composite  estimator  is  used  with  a  two-phase 
sample  of  plots  (i.e.,  the  first-phase  plots  are  classified 
with  less-expensive  aerial  photography,  and  a 
subsample  of  second-phase  plots  is  classified  by  more- 
expensive  field  crews). 


Cov0  (£;.),  and  Cov  (£.,-)  for  certain  tests  with  Eqs.  67, 
70,  71,  72,  74,  75,  and  80;  Cov0  (vecP)  for  CovJp.^J  in 
Eq.  100,  Gov0(p(ij,.)  -p2J  inEq.  101,  and  Covo(p(j,i0  -p,) 
in  Eq.  103.  The  true  py  and  p  are  unknown,  but  the 
following  estimates  are  available:  E'[piA  =  Pj.p.j 
A  In  the  special  case  of  the  multinomial  distribution, 
E0[£ij£rs]  is  readily  estimated  as  follows,  using  Eqs.  46 
and  47: 


E0  [8^]  = 


(p,p,)[l-(p,p,)] 


[106] 


n 


Ej^lhMMA,^^^.  Hoy] 


n 


In  matrix  form,  this  is  equivalent  to: 

Cov0  (vecP)  =  diag [vecPc ) -  vecPc  (vecPc )'  In,  [108] 

where  Pc  =  p^p  •  is  the  expected  contingency  table  un- 
der the  null  hypothesis.  For  example,  Eqs.  106  and  107 
are  used  with  Eq.  55  to  show  that  Vai0[Kw)  in  Eq.  28 
agrees  with  the  results  of  Fleiss  et  al.  (1969,  Eq.  9). 

E0  [ei ;-£rs  ]  is  more  difficult  to  estimate  in  the  more 
general  case.  Using  the  first  two  terms  of  the  multivari- 
ate Taylor  series  expansion  (Eq.  10): 


PiPj~PiPi+£ij 


dPi.P.j 
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\Pii=Pi> 
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+  L  +£kj 


fdPiP,^ 


dPkj 


\PirPij 


[109] 


Using  Eqs.  58  and  59,  the  partial  derivatives  in  the  first- 
order  Taylor  series  approximation  are  solved  as  follows: 


Covariances  Under  Independence  Hypothesis 

Under  the  hypothesis  that  the  two  classifiers  are  in- 
dependent, and  any  agreement  between  the  two  classi- 
fiers is  a  chance  event,  E[pA  =  Pj.p.j  ■  This  effects 
E0[£^£TS]  and  Coy0(vecP)  for  Var0  (k:^)  in  Eqs.  28  and 
45;  E^eJ  for  Var0(£)  inEq.  34;  Var0(£.),  Var0(£j, 


pi.p.j={ai.y+pij){aj{i+pii) 
=  ^ja.jV+pij{a.jV+ai{j)  +  p2ij, 


[110] 


=  (a,f  +a,,)  +  2py/ =(pJ,  +  p./), 


PirPij 
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PiPj  =  (a,ls+pis)p,,  l<s<k,s*j, 


v        s*;  /|P,y=P,y 


=  P.j 


[111] 


A-P-;  =  Pi-  ta./k  +  pr/),  1  <  r  <  Jc,  r  *  i, 

=  pf..  [112] 


dPiPj 


dpr 


r*i  APii=Pa 


Substituting  Eqs.  110,  111,  and  112  into  Eq.  109: 
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[113] 


Equation  113  provides  an  estimate  of  the  £"0[£^;]  un- 
der the  null  hypothesis  of  independence  between  clas- 
sifiers, which  is  the  diagonal  of  the  k2xk2  covariance 
matrix  Cov0  [vecP).  The  off-diagonal  elements  are  esti- 
mated with  the  Taylor  series  approximation  as  follows: 


£o\ij  £o\rs 
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[114] 


Matrix  Formulation  for  EQ  [£j;£rs  ] 

Equations  113  and  114  can  be  expressed  in  matrix 
algebra.  First,  define  the  k2xk2  matrix  P.  as  follows: 


P  = 


Pi 

0 

L 

0 

0 

Pi 

L 

0 

M 

M 

0 

M 

0 

0 

L 

[115] 


where  0  is  a  kxk  matrix  of  zeros,  and  P,  is  the  kxk  matrix 
in  which  the  ith  column  vector  equals  pL  as  defined  in 
Eqs.  2  or  33.  Table  7  includes  an  example  of  P,..  in  Eq.  115. 
Next,  define  p  ■  as  the  following  k2xk2  matrix: 


IPi  !p2  L  lPk 

!Pi  Ip2  L  IP* 

M  M  0  M 

IPi  IP2  L  IP-Jc 


[116] 


where  I  is  the  kxk  identity  matrix,  and  p  ,  is  the  scalar 
marginal  for  the  /th  column  of  the  error  matrix  (Eqs.  3,31, 
or  120).  Table  7  includes  an  example  of  P .  in  Eq.  116. 

The  k2xk2  covariance  matrix  for  the  estimated  vector 
version  of  the  error  matrix,  Cov0  (vecP),  expected  un- 
der the  null  hypothesis  of  independence  between  clas- 
sifiers, equals: 

Cov0  (vecP)  =  {Pj.  +P.)'  Cov(vecP)  (P,  +P;),  [117] 


where  Py  and  P .  are  defined  in  Eqs.  115  and  116;  and 
CovG  (vecP)  is  the  k2xk2  covariance  matrix  for  the  esti- 
mated vector  version  of  the  error  matrix  (vecP),  ex- 
amples of  which  are  given  in  Eqs.  104  and  105.  Table  7 
includes  an  example  of  CovG  (vecP)  in  Eq.  117.  Equa- 
tion 117  is  merely  a  different  expression  of  E0  [£j7£rs]  in 
Eqs.  113  and  114. 


STRATIFIED  SAMPLE  OF  REFERENCE  DATA 

Stratified  random  sampling  can  be  more  efficient  than 
simple  random  sampling  when  some  classes  are  sub- 
stantially less  prevalent  or  important  than  others 


20 


(Campbell  1987,  p.  358;  Congalton  1991).  This  section 
considers  strata  that  are  defined  by  remotely  sensed  clas- 
sifications, and  reference  data  that  are  a  separate  ran- 
dom sample  of  pixels  (with  replacement)  within  each 
stratum.  This  concept  includes  not  only  pre-stratifica- 
tion  (e.g.,  Green  et  al.  1993),  but  also  post-stratification 
of  a  simple  random  sample  based  on  the  remotely  sensed 
classification.  Since  the  stratum  size  in  the  total  popu- 
lation is  known  without  error  for  each  remotely  sensed 
category  (through  a  computer  census  of  classified  pix- 
els), pre-  and  post-stratification  could  potentially  im- 
prove estimation  precision  in  accuracy  assessments  and 


estimates  of  area  in  each  category  as  defined  by  the  pro- 
tocol used  for  the  reference  data. 

The  current  section  assumes  that  each  sample  unit  is 
classified  into  only  one  category  by  each  classifier, 
which  precludes  reference  data  from  cluster  plots 
(Congalton  1991),  such  as  photo-interpreted  maps  of 
sample  areas  (e.g.,  Czaplewski  et  al.  1987).  The  covari- 
ance  matrix  for  the  multinomial  distribution,  which  is 
given  in  Eqs.  46,  47,  and  104  is  appropriate  for  simple 
random  sampling,  but  must  be  used  differently  for  strati- 
fied random  sampling  since  sampling  errors  are  inde- 
pendent among  strata. 


Table  7.  —  Example  of  the  covariance  matrix  assuming  the  null  hypothesis  of  independence  between  classifiers  and  intermediate  matrices. 

The  contingency  table  in  table  2  is  used. 


Cov0  (vecP)  (Eq.  117) 
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P,  (Eq.  115) 
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Py  (Eq-116) 
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Let  the  rows  (i.e.,  i  or  r  subscripts)  of  the  contingency 
table  represent  the  true  reference  classifications,  and 
the  columns  (i.e.,  /  or  s  subscripts)  represent  the  less- 
accurate  classifications  (e.g.,  remotely  sensed  categori- 
zations). Assume  pre-stratification  of  reference  data  is 
based  on  the  remotely  sensed  classifications,  which  are 
available  for  all  members  of  the  population  (e.g.,  all  pix- 
els in  an  image)  before  the  sample  of  reference  data  is 
selected.  In  stratified  random  sampling,  sampling  er- 
rors between  all  pairs  of  strata  (i.e.,  columns  in  contin- 
gency table)  are  assumed  to  be  mutually  independent: 


Cov(p,.pJ  =  0for  j*s. 


[118] 


Assume  the  size  of  each  stratum  /  (i.e.,  p  .)  is  known 
without  error  (e.g.,  a  proportion  based  on  a  complete 
enumeration  or  census  of  all  pixels  for  each  remotely 
sensed  classification).  Let  n,  be  the  sample  size  of  ref- 
erence plots  in  the  /th  stratum,  and  be  the  number 
of  reference  plots  classified  as  category  i  in  the  ;'th  stra- 
tum. In  this  case, 
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[120] 


The  multinomial  distribution  provides  the  covariance 
matrix  for  sampling  errors  within  each  independent  stra- 
tum j  (see  Eqs.  46  and  47).  This  distribution  with  Eq. 
120  produces: 
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C°v{pijplj)  =  -pzj 


n 


rp2innn^ 
n3; 


[121] 


[122] 


The  general  variance  approximation  for  kw  is  given  in 
Eq.  25.  Replacing  Eqs.  118,  121,  and  122  into  Eq.  25, 
and  noting  that  the  fourth  summation  disappears  from 
Eq.  25  because  of  the  independence  of  sampling  errors 
across  strata: 
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Accuracy  Assessment  Statistics  Other  Than  kw 

a  The  covariance  matrix  Covs  (vecP)  for  the  covariances 
Cov (p/.Pn  )  for  stratified  random  sampling  (Eqs.  121  and 
122)  can  be  expressed  in  matrix  algebra  for  use  with  the 
matrix  formulations  of  accuracy  assessment  statistics 
in  this  paper  (e.g.,  Eqs.  75,  80,  93,  97,  101,  102,  and  103). 

First,  the  kxk  matrix  of  estimated  joint  probabilities 
( p)  must  be  estimated  from  the  kxk  matrix  of  estimated 
conditional  probabilities  Ps  from  the  stratified  sample. 
The  strata  are  defined  by  the  classifications  on  the  col- 
umnspf  Ps ;  therefore,  the  column  marginals  all  equal  1 
(i.e.,  Ps'l  =  1).  The  strata  sizes  are  assumed  known  with- 
out error  (e.g.,  pixel  count,  where  remotely  sensed  clas- 
sifications are  on  the  column  and  are  used  for  pre-strati- 
fication of  sample),  and  are  represented  by  the  kxl  vector 
of  proportions  of  the  domain  that  are  in  each  stratum 
(ns,  where  1^1  =  1).  P  is  estimated  from  Ps  by  divid- 
ing each  element  in  the  ;th  column  of  Ps  by  the  /th  ele- 
ment of  ns,  and  then  is  used  to  define  the  k2xl  vector 
version  (vecP)  of  the  kxk  matrix  ( P). 
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Next,  compute  the  covariance  matrix  for  this  estimate 
vecV .  Let  p;  represent  the  kxl  vector  in  which  the  ith 
element  is  the  observed  proportion  of  category  i  in  stra- 
tum ;'.  The  kxk  covariance  matrix  for  the  estimated  pro- 
portions in  the  /th  stratum,  assuming  the  multinomial 
distribution,  is: 

Cov (p; )  =  (1  -  F^diag (py ) -  pyp;.  ]  /  nj ,  [a  24] 

where  n.  is  the  sample  size  of  units  that  are  classified 
into  one  and  only  one  category  by  each  of  the  two  clas- 
sifiers in  the  /th  stratum;  diag  (p; )  is  the  kxk  matrix  with 
p.  on  its  main  diagonal,  and  all  other  elements  are  equal 
to  zero.  [1-F-]  in  Eq.  124  is  the  finite  population  correc- 
tion factor  for  stratum  /.  F.  equals  zero  if  sampling  is 
with  replacement  or  the  population  size  is  large  rela- 
tive to  the  sample  size,  which  is  the  usual  case  in  re- 
mote sensing.  Equation  124  is  closely  related  to  Eq.  104 
for  simple  random  sampling.  The  joint  probabilities  in 
the  /th  column  of  the  contingency  table  (p)  equal  p; 
divided  by  the  stratum  size  p ...  Since  p.;  is  known  with- 
out error  in  the  type  of  stratified  random  sampling  be- 
ing considered  here, 
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[125] 

where  Cov(p;)  is  defined  in  Eq.  124  and  0  is  the  kxk 
matrix  of  zeros. 

Equations  124  and  125,  when  used  with  Eqs.  93  and 
97  for  conditional  probabilities  on  the  diagonal,  agree 
with  the  results  of  Green  et  al.  (1993)  after  transposi- 
tions to  change  stratification  to  the  column  classifier 
rather  than  the  row  classifier.  Equations  124  and  125,  when 
used  with  Eq.  43  for  unweighted  kappa  {{kw  ,  W  =  I)), 
agree  with  the  unpublished  results  of  Stephen  Stehman 
(personal  communication)  after  similar  transpositions 
to  change  stratification  to  the  column  classifier.  Congalton 
(1991)  suggested  testing  the  effect  of  stratified  sampling 
with  the  variance  estimator  for  simple  random  sampling, 
and  Eq.  125  permits  this  comparison. 


SUMMARY 


in  remote  sensing,  where  more  complex  sampling  de- 
signs and  different  sample  units  are  more  practical  or 
efficient.  Unfortunately,  variance  estimators  for  simple 
random  sampling  are  naively  applied  when  other  sam- 
pling designs  are  used  (e.g.,  Stenback  and  Congalton, 
1990;  Gong  and  Howarth,  1990).  This  improper  use  of 
published  variance  estimators  surely  affects  tests  of 
hypotheses,  although  the  typical  magnitude  of  the  prob- 
lem is  unknown  (Stehman  1992). 

The  variance  estimators  foi  the  weighted  kappa  sta- 
tistic [Eqs.  24  and  43  for  Var(/cwJ  and  Eqs.  28  and  45 
for  Var0  (£„,)];  the  unweighted  kappa  statistic  [Eq.  33 
for  Var  [ic]  and  Eq.  34  for  Var(K')];  the  conditional  kappa 
statistic  [Eqs.  67  and  75  for  Var0  (£,,)  and  Eqs.  70  and  80 
for  VarOcJ];  conditional  probabilities  [(Eqs.  85,  87,  88, 
89,  93,  and  97  for  Var(p;.|;)  and  Var(p.|;,)];  and  differ- 
ences between  diagonal  conditional  probabilities  and 
their  expected  values  under  the  independence  assump- 
tion (Eqs.  101,  102,  and  103)  are  the  first  step  in  cor- 
recting this  problem.  These  equations  form  the  basis 
for  approximate  variance  estimators  for  other  sampling 
situations,  such  as  cluster  sampling,  systematic  sam- 
pling (Wolter  1985  in  Stehmam  1992),  and  more  com- 
plex designs  (e.g.,  Czaplewski  1992).  Stratified  random 
sampling  is  an  important  design  in  accuracy  assess- 
ments, and  the  more  general  variance  estimators  in  this 
paper  were  used  to  construct  the  appropriate  Var(^lv) 
in  Eq.  123  and  other  accuracy  assessment  statistics  us- 
ing Covs  (vecP)  in  Eq.  125.  Rapid  progress  in  assess- 
ments of  classification  accuracy  with  more  complex 
sampling  and  estimation  situations  is  expected  based 
on  the  foundation  provided  in  this  paper. 
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Tests  of  hypotheses  with  the  estimated  accuracy  as- 
sessment statistics  can  require  a  variance  estimate.  Most 
existing  variance  estimators  for  accuracy  assessment 
statistics  assume  that  the  multinomial  distribution  ap- 
plies to  the  sampling  design  used  to  gather  reference 
data.  The  multinomial  distribution  implies  that  this 
design  is  a  simple  random  sample  where  each  sample 
unit  (e.g.,  a  pixel)  is  separately  classified  into  a  single 
category  by  each  classifier.  This  assumption  is  overly 
restrictive  for  many,  perhaps  most,  accuracy  assessments 
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APPENDIX  A:  Notation 

Variable  Definition  Equation 

<*i\u  Pi  ~  Pin   58 

by  coefficients  containing  p~  in  pc   17,  21 

c-j  coefficients  not  containing     in  pc   18 

Cov [pjjPij]  estimated  variance  for  pj7  Var(pj7)   46 

Cov(pi;prs)  estimated  covariance  between  ptj  and  prs   47,118 

Covfpyk..)]  kxk  covariance  matrix  for  the  kxl  vector  of  estimated  conditional  probabilities 

(Pyi,-.)]  on  the  diagonal  of  the  error  matrix  (conditioned  on  the  row  classification)   97 

Cov(p(ii2))  kxk  covariance  matrix  for  the  kxl  vector  of  estimated  conditional  probabilities 

(pyi.f))  on  the  diagonal  of  the  error  matrix  (conditioned  on  the  column  classification) .  .  93 

Cov(vecP)  k2xk2  covariance  matrix  for  the  estimate  vecP   104, 105, 125 

Cov(kv )  covariance  matrix  for  the  conditional  kappa  statistics    75 

CovOcJ  covariance  matrix  for  the  conditional  kappa  statistics  Kmi   80 

A. 

Cov0(pi_)  2kx2k  covariance  matrix  for  py_   100 

Cov0  (ppu.)  -  p  j )        kxk  covariance  matrix  for  differences  between  the  observed  conditional 

probabilities  on  the  diagonal  (Pyii.))  and  their  expected  values  under  the 
independence  hypothesis   103 

Cov0  [p(l-i.Y)  -  Pj  )        kxk  covariance  matrix  for  differences  between  the  observed  conditional 

probabilities  on  the  diagonal  [pyi.,))  and  their  expected  values  under  the 
independence  hypothesis   101,102 

Cov0  [vecP]  k2xk2  covariance  matrix  for  the  estimate  vecP  under  the  null  hypothesis  of 

independence  between  the  row  and  column  classifiers   45 

Covs  (vecP)  k2xk2  covariance  matrix  for  the  estimate  vecP  for  stratified  random  sampling   125 

k2xl  vector  containing  the  first-order  Taylor  series  approximation  of  icw  or  it   42,  43 

Dy  k2xk  matrix  of  zeros  and  ones  defined  in  Eq.  99   99 

Dy^-.j  kxk  intermediate  matrix  used  to  compute  Var(p(j|y)),  where  (vecP/D,^.)  is  the 

linear  approximation  of  p^   96 

D{l|.f)  kxk  intermediate  matrix  used  to  compute  Var(p.|.J  ),  where  (vecPj'D^.;,  is  the 

linear  approximation  of  pf,i   92 

Dyiy.j.j  k2xk  intermediate  matrix^  equal  to  D.y_ 

[l|-I]',  used  to  compute  Cov0  (p(i|i0  -pj    103 

D(j|.iH,  k2xk  intermediate  matrix^  equal  to  D^_ 

[l|  -I]' ,  used  to  compute  Cov0  (p(j, ,n  -  p. )    102 
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APPENDIX  A:  Notation  (Continued) 

Variable  Definition  Equation 

D^  k2xl  vector  containing  the  first-order  Taylor  series  approximation  of  kw  or  k 

under  the  null  hypothesis  of  independence  between  the  row  and  column  classifiers  .  .  44 

D.;_  k2x2k  matrix  equal  to  [Dyi^DJ,  used  to  compute  CovG  (p^)   100 

d^  k2xk  matrix  used  in  the  matrix  computation  of  Cov(/cJ   74 

d^  k2xk  matrix  used  in  the  matrix  computation  of  Cov (£,-.)   79 

diagA  kxl  vector  containing  the  diagonal  of  the  kxk  matrix  A. 

£"[•]  expectation  operator. 

[£  g  ]                 expected  covariance  between  cells  ij  and  rs  in  the  contingency  table  under 
0      rs  the  null  hypothesis  of  independence  between  classifiers   113,114,117 

F  finite  population  correction  factor  for  covariance  matrix  under  simple  random 

sampling   104 

F-  finite  population  correction  factor  for  stratum  /'  in  covariance  matrix  for  stratified 

random  sampling   124 

Gy  icxic  matrix  used  in  the  matrix  computation  of  Cov(£- )   73 

G  ,  kxk  matrix  used  in  the  matrix  computation  of  Cov    78 

Gp(j)  kxk  intermediate  matrix  used  to  compute  Vaiip^)    91 

Gp(j ;.)  kxk  intermediate  matrix  used  to  compute  VarCp^.)   95 

Hy  kxk  matrix  used  in  the  matrix  computation  of  Cov[k1 )   71 

Hp(j)  kxk  intermediate  matrix  used  to  compute  Var(pJ|i  )    90 

Hp(j  )  kxk  intermediate  matrix  used  to  compute  Varfp^.)   94 

H.,  kxk  matrix  used  in  the  matrix  computation  of  Covf^.J   76 

I  the  kxk  identity  matrix  in  which  I..  =  1  for  i  # ;  and  I..  =  0  otherwise. 

i  row  subscript  for  contingency  table   6 

column  subscript  for  contingency  table   6 

k  number  of  categories  in  the  classification  system   6 

My.  kxk  matrix  used  in  the  matrix  computation  of  Cov(/cJ )   72 

M.  kxk  matrix  used  in  the  matrix  computation  of  CovOcJ   77 

n.j  sample  size  of  reference  plots  in  the  ;'th  stratum   119 

pc  matching  proportion  expected  assuming  independence  between  the  row  and  column 

classifiers   5 
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APPENDIX  A:  Notation  (Continued) 

Variable  Definition  Equation 

pc  matching  proportion  expected  assuming  independence  between  the  row  and  column 

classifiers  (estimated)   7,  30 

p..  i/th  proportion  in  contingency  table   6,  26 

ft-  estimated  i/th  proportion  in  contingency  table   7 

py  row  marginal  of  contingency  table   2,  32 

py  kxl  vector  in  which  the  ith  element  is  p;    35,  99 

p;  kxl  vector  in  which  the  ith  element  is  the  observed  proportion  of  category  i  in 

stratum  /  (used  for  stratified  random  sampling  example)    125 

p  ,._  2kxl  vector  (p^.Jp'.)'  containing  the  observed  and  expected  conditional 

probabilities  on  diagonal  of  error  matrix,  conditioned  on  column  classification   100 

po  proportion  matching  classifications   4,  27 

pG  estimated  proportion  matching  classifications   7,  29 

p.j  column  marginal  of  contingency  table   3,  31,  120,  125 

p.,  kxl  vector  in  which  the  ith  element  is  p.,    36 

puy.}  conditional  probability  that  the  column  classification  is  category  i  given  that  the  row 

classification  is  category  /   86 

p(j|.;)  conditional  probability  that  the  row  classification  is  category  i  given  that  the  column 

classification  is  category  /   81 

Pyijj  kxl  vector  of  diagonal  conditional  probabilities  with  its  ith  element  equal 

to  pu[i)   92 

P  kxk  matrix  (i.e.,  the  error  matrix  in  remote  sensing  jargon)  in  which  the  i/th  element 

of  P  is  the  scalar  pi;.   35,  36 

Pc  £"[P]  under  null  hypothesis  of  independence  between  the  row  and  column 

classifiers   37 

Py.  k2xk2  intermediate  matrix  used  to  compute  Cov0  (vecP)   115,117 

Py  kxk  intermediate  matrix  used  to  compute  CovD  (vecP)    115 

Py  kxk  intermediate  matrix  used  to  compute  Cov0  (vecP)    116,  117 

R  remainder  in  Taylor  series  expansion   10 

Var(py/)  estimated  variance  for  pjy  Cov(pi;.pi;  )    46 

Var(pi|;.)  estimated  variance  of  random  errors  for  estimating  conditional  probability  p^. 

(conditioned  on  row  j]   87,  89 

Varfp^.y)  estimated  variance  of  random  errors  for  estimating  conditional  probability  p^j 

(conditioned  on  column  ;')   85,  88 
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APPENDIX  A:  Notation  (Continued) 


Variable  Definition  Equation 

Var(£)  estimated  variance  of  random  errors  for  kappa  .  .  .   33 

Var(K*y.)  estimated  variance  of  random  errors  for  conditional  kappa,  conditioned 

on  row  classifier   67 

Var0  (kv  )  estimated  variance  of  random  errors  for  conditional  kappa,  conditioned 

on  row  classifier,  under  the  null  hypothesis  of  independence  between  classifiers   67 

Var^y)  estimated  variance  of  random  errors  for  conditional  kappa,  conditioned 

on  column  classifier   70 

Var0  [K.j)  estimated  variance  of  random  errors  for  conditional  kappa,  conditioned 

on  column  classifier,  under  the  null  hypothesis  of  independence  between  classifiers  .  .  70 

Vai[Kw)  variance  of  random  estimation  errors  for  weighted  kappa   9 

Var(K*IV)  estimated  variance  of  random  errors  for  weighted  kappa   25 

Var0  [kw]  estimated  variance  of  random  errors  for  weighted  kappa  under  the  null 

hypothesis  of  independence  between  the  row  and  column  classifiers  (i.e.,  kw)   28 

Var0  [k)  estimated  variance  of  random  errors  for  unweighted  kappa  under  the  null 

hypothesis  of  independence  between  the  row  and  column  classifiers  (i.e.,  k)   28 

vecA  the  k2xl  vector  version  of  the  kxk  matrix  A.  If  a^-  is  the  kxl  column  vector  in  which 

the  ith  element  equals  a^,  then  A  [aa  a2  L  Ja^],  and  vecA  [a[  a'2  L  \a'k]'    42,  104 

w..  weight  placed  on  agreement  between  category  i  under  the  first  classification 

protocol,  and  category  /  under  the  second  protocol   6 

w},  weighted  average  of  the  weights  in  the  ith  row   22,  31 

wi  kxl  vector  used  in  the  matrix  computation  of  k   40 

w.j  weighted  average  of  the  weights  in  the  /th  column   23,32 

w.j  kxl  vector  used  in  the  matrix  computation  of  k   41 

W  kxk  matrix  in  which  the  i/th  element  is    38,  37 

el  squared  error  in  estimated  kappa  ( kw  -kw)2   12 

eK  random  error  in  estimated  kappa  (  kw  -  kw)    8,11 

e9  iPij-Pij)    10 

Ky  conditional  kappa  for  row  category  i   56 

k y  conditional  kappa  for  column  category  i   68 

kw  weighted  kappa  statistic   6 

kw  weighted  kappa  statistic  (estimated)   7 

0  kxk  matrix  of  zeros   115 

28 


APPENDIX  A:  Notation  (Continued) 

Variable  Definition  Equation 

(die) 

  partial  derivative  of  k  with  respect  to  pi}-  evaluated  at  p~  p-  for  all  i,  j   20,  83 

®  element-by-element  multiplication,  where  the  iy'th  element  of  ( A  <8>  B )  is  a-b-,  and 

matrices  A  and  B  have  the  same  dimensions   38,  37 
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The  Rocky  Mountain  Station  is  one  of  eight 
regional  experiment  stations,  plus  the  Forest 
Products  Laboratory  and  the  Washington  Office 
Staff,  that  make  up  the  Forest  Service  research 
organization. 

RESEARCH  FOCUS 

Research  programs  at  the  Rocky  Mountain 
Station  are  coordinated  with  area  universities  and 
with  other  institutions.  Many  studies  are 
conducted  on  a  cooperative  basis  to  accelerate 
solutions  to  problems  involving  range,  water, 
wildlife  and  fish  habitat,  human  and  community 
development,  timber,  recreation,  protection,  and 
multiresource  evaluation. 

RESEARCH  LOCATIONS 

Research  Work  Units  of  the  Rocky  Mountain 
Station  are  operated  in  cooperation  with 
universities  in  the  following  cities: 


Albuquerque,  New  Mexico 
Flagstaff,  Arizona 
Fort  Collins,  Colorado* 
Laramie,  Wyoming 
Lincoln,  Nebraska 
Rapid  City,  South  Dakota 


'Station  Headquarters:  240  W.  Prospect  Rd.,  Fort  Collins,  CO  80526 


